Automatic detection of zones of interest in a video

ABSTRACT

A method at a computing system includes obtaining video of an environment including a plurality of objects; defining a zone including a portion of the environment; subsequent to the defining, detecting a motion event captured in the video occurring at least partially within the zone, wherein the motion event is associated with a first object of the plurality of objects; identifying an object type of the first object; and based on the object type of the first object, causing a notification of the motion event to be issued or not issued.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/431,710, filed Feb. 13, 2017, which is hereby incorporated byreference in its entirety.

TECHNICAL FIELD

The disclosed implementations relate generally to video monitoring,including, but not limited, to automatically detecting zones of interestin a field of view of a video feed.

BACKGROUND

The advancement of internet and mobile technologies has enabled theadoption of remote video surveillance by users. Users can now monitor anarea under video surveillance using a website or a mobile application.Such websites or mobile apps typically allow a user to view live videoand/or saved video recordings, but otherwise provide little or noadditional information regarding the videos. A user may specify certainparts of an area under video surveillance as zones of interest, suchthat, for examples, motion activity that occur in these zones havenotification priority. However, having the user specify the zones placethe burden on the user. Furthermore, the user may be unaware of therelationships between motion activity detected in the video andparticular areas in the field of view of the camera.

SUMMARY

Accordingly, there is a need for methods and systems for automaticdetection and definition of zones of interest in live and/or savedvideo. Such methods and systems optionally complement or replaceconventional methods for defining zones of interest in live and/or savedvideo.

In accordance with some implementations, a method includes, at acomputing system with one or more processors and one or more memorycomponents: obtaining video of an environment including a plurality ofobjects, where the video has a field of view; identifying one or moreobjects of the plurality of objects within the field of view; defining azone of interest associated with a first object of the one or moreobjects, including identifying the zone of interest as one of analerting zone or a suppression zone; subsequent to the defining,detecting one or more motion events captured in the video occurring atleast partially within the zone of interest; when the zone of interestis an alerting zone, causing one or more notifications of the one ormore motion events to be issued; and when the zone is a suppressionzone, suppressing notifications of the one or more motion events.

In accordance with some implementations, a computing system includes oneor more processors, one or more memory components, and one or moreprograms stored in the one or more memory components and configured forexecution by the one or more processors. The one or more programsinclude instructions for: obtaining video of an environment including aplurality of objects, where the video has a field of view; identifyingone or more objects of the plurality of objects within the field ofview; defining a zone of interest associated with a first object of theone or more objects, including identifying the zone of interest as oneof an alerting zone or a suppression zone; subsequent to the defining,detecting one or more motion events captured in the video occurring atleast partially within the zone of interest; when the zone of interestis an alerting zone, causing one or more notifications of the one ormore motion events to be issued; and when the zone is a suppressionzone, suppressing notifications of the one or more motion events.

In accordance with some implementations, a non-transitory computerreadable storage medium stores one or more programs. The one or moreprograms include instructions, which, when executed by a computingsystem with one or more processors, cause the computing system toperform operations including: obtaining video of an environmentincluding a plurality of objects, where the video has a field of view;identifying one or more objects of the plurality of objects within thefield of view; defining a zone of interest associated with a firstobject of the one or more objects, including identifying the zone ofinterest as one of an alerting zone or a suppression zone; subsequent tothe defining, detecting one or more motion events captured in the videooccurring at least partially within the zone of interest; when the zoneof interest is an alerting zone, causing one or more notifications ofthe one or more motion events to be issued; and when the zone is asuppression zone, suppressing notifications of the one or more motionevents.

Thus, computing systems and electronic devices are provided with moreefficient methods for detecting and defining zones of interest in liveand/or saved video, thereby increasing the effectiveness, efficiency,and user satisfaction with such systems and devices. Such methods maycomplement or replace conventional methods for defining zones ofinterest in live and/or saved video.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the various described implementations,reference should be made to the Description of Implementations below, inconjunction with the following drawings in which like reference numeralsrefer to corresponding parts throughout the figures.

FIG. 1 is an example smart home environment, in accordance with someimplementations.

FIG. 2 is a block diagram illustrating an example network architecturethat includes a smart home network, in accordance with someimplementations.

FIG. 3 illustrates a network-level view of an extensible devices andservices platform with which the smart home environment of FIG. 1 isintegrated, in accordance with some implementations.

FIG. 4 illustrates an abstracted functional view of the extensibledevices and services platform of FIG. 3, with reference to a processingengine as well as devices of the smart home environment, in accordancewith some implementations.

FIG. 5A is a representative operating environment in which a hub deviceserver system interacts with client devices and hub devicescommunicatively coupled to local smart devices, in accordance with someimplementations.

FIG. 5B is a representative operating environment in which a videoserver system interacts with client devices and hub devicescommunicatively coupled to local smart devices, in accordance with someimplementations.

FIG. 6 is a block diagram illustrating a representative hub device, inaccordance with some implementations.

FIG. 7A is a block diagram illustrating a representative hub deviceserver system, in accordance with some implementations.

FIGS. 7B-7C are block diagrams illustrating a representative videoserver system, in accordance with some implementations.

FIG. 7D is a block diagram illustrating a representative clientinterface server, in accordance with some implementations.

FIG. 7E is a block diagram illustrating a representative camerainterface server, in accordance with some implementations.

FIG. 8A-8B are block diagrams illustrating a representative clientdevice associated with a user account, in accordance with someimplementations.

FIG. 9A is a block diagram illustrating a representative smart device,in accordance with some implementations.

FIG. 9B is a block diagram illustrating a representative video capturingdevice (e.g., a camera) in accordance with some implementations.

FIG. 10 is a block diagram illustrating a representative smart homeprovider server system, in accordance with some implementations.

FIG. 11A illustrates a representative system architecture, in accordancewith some implementations.

FIG. 11B illustrates a representative processing pipeline, in accordancewith some implementations.

FIGS. 12A-12E illustrate example user interfaces on a client device forpresenting suggested zones in accordance with some implementations.

FIG. 13 illustrates a flowchart diagram of a method for definingsuggested zones, in accordance with some implementations.

FIGS. 14A-14G illustrate example screenshots of user interfaces on aclient device in accordance with some implementations.

Like reference numerals refer to corresponding parts throughout theseveral views of the drawings.

DESCRIPTION OF IMPLEMENTATIONS

Reference will now be made in detail to implementations, examples ofwhich are illustrated in the accompanying drawings. In the followingdetailed description, numerous specific details are set forth in orderto provide a thorough understanding of the various describedimplementations. However, it will be apparent to one of ordinary skillin the art that the various described implementations may be practicedwithout these specific details. In other instances, well-known methods,procedures, components, circuits, and networks have not been describedin detail so as not to unnecessarily obscure aspects of theimplementations.

FIG. 1 is an example smart home environment 100 in accordance with someimplementations. Smart home environment 100 includes a structure 150(e.g., a house, office building, garage, or mobile home) with variousintegrated devices. It will be appreciated that devices may also beintegrated into a smart home environment 100 that does not include anentire structure 150, such as an apartment, condominium, or officespace. Further, the smart home environment 100 may control and/or becoupled to devices outside of the actual structure 150. Indeed, one ormore devices in the smart home environment 100 need not be physicallywithin the structure 150. For example, a device controlling a poolheater 114 or irrigation system 116 may be located outside of thestructure 150.

The depicted structure 150 includes a plurality of rooms 152, separatedat least partly from each other via walls 154. The walls 154 may includeinterior walls or exterior walls. Each room may further include a floor156 and a ceiling 158. Devices may be mounted on, integrated with and/orsupported by a wall 154, floor 156 or ceiling 158.

In some implementations, the integrated devices of the smart homeenvironment 100 include intelligent, multi-sensing, network-connecteddevices that integrate seamlessly with each other in a smart homenetwork (e.g., 202 FIG. 2) and/or with a central server or acloud-computing system to provide a variety of useful smart homefunctions. The smart home environment 100 may include one or moreintelligent, multi-sensing, network-connected thermostats 102(hereinafter referred to as “smart thermostats 102”), one or moreintelligent, network-connected, multi-sensing hazard detection units 104(hereinafter referred to as “smart hazard detectors 104”), one or moreintelligent, multi-sensing, network-connected entryway interface devices106 and 120 (hereinafter referred to as “smart doorbells 106” and “smartdoor locks 120”), and one or more intelligent, multi-sensing,network-connected alarm systems 122 (hereinafter referred to as “smartalarm systems 122”).

In some implementations, the one or more smart thermostats 102 detectambient climate characteristics (e.g., temperature and/or humidity) andcontrol a HVAC system 103 accordingly. For example, a respective smartthermostat 102 includes an ambient temperature sensor.

The one or more smart hazard detectors 104 may include thermal radiationsensors directed at respective heat sources (e.g., a stove, oven, otherappliances, a fireplace, etc.). For example, a smart hazard detector 104in a kitchen 153 includes a thermal radiation sensor directed at astove/oven 112. A thermal radiation sensor may determine the temperatureof the respective heat source (or a portion thereof) at which it isdirected and may provide corresponding blackbody radiation data asoutput.

The smart doorbell 106 and/or the smart door lock 120 may detect aperson's approach to or departure from a location (e.g., an outer door),control doorbell/door locking functionality (e.g., receive user inputsfrom a portable electronic device 166-1 to actuate bolt of the smartdoor lock 120), announce a person's approach or departure via audio orvisual means, and/or control settings on a security system (e.g., toactivate or deactivate the security system when occupants go and come).

The smart alarm system 122 may detect the presence of an individualwithin close proximity (e.g., using built-in IR sensors), sound an alarm(e.g., through a built-in speaker, or by sending commands to one or moreexternal speakers), and send notifications to entities or userswithin/outside of the smart home network 100. In some implementations,the smart alarm system 122 also includes one or more input devices orsensors (e.g., keypad, biometric scanner, NFC transceiver, microphone)for verifying the identity of a user, and one or more output devices(e.g., display, speaker). In some implementations, the smart alarmsystem 122 may also be set to an “armed” mode, such that detection of atrigger condition or event causes the alarm to be sounded unless adisarming action is performed.

In some implementations, the smart home environment 100 includes one ormore intelligent, multi-sensing, network-connected wall switches 108(hereinafter referred to as “smart wall switches 108”), along with oneor more intelligent, multi-sensing, network-connected wall pluginterfaces 110 (hereinafter referred to as “smart wall plugs 110”). Thesmart wall switches 108 may detect ambient lighting conditions, detectroom-occupancy states, and control a power and/or dim state of one ormore lights. In some instances, smart wall switches 108 may also controla power state or speed of a fan, such as a ceiling fan. The smart wallplugs 110 may detect occupancy of a room or enclosure and control supplyof power to one or more wall plugs (e.g., such that power is notsupplied to the plug if nobody is at home).

In some implementations, the smart home environment 100 of FIG. 1includes a plurality of intelligent, multi-sensing, network-connectedappliances 112 (hereinafter referred to as “smart appliances 112”), suchas refrigerators, stoves, ovens, televisions, washers, dryers, lights,stereos, intercom systems, garage-door openers, floor fans, ceilingfans, wall air conditioners, pool heaters, irrigation systems, securitysystems, space heaters, window AC units, motorized duct vents, and soforth. In some implementations, when plugged in, an appliance mayannounce itself to the smart home network, such as by indicating whattype of appliance it is, and it may automatically integrate with thecontrols of the smart home. Such communication by the appliance to thesmart home may be facilitated by either a wired or wirelesscommunication protocol. The smart home may also include a variety ofnon-communicating legacy appliances 140, such as old conventionalwasher/dryers, refrigerators, and the like, which may be controlled bysmart wall plugs 110. The smart home environment 100 may further includea variety of partially communicating legacy appliances 142, such asinfrared (“IR”) controlled wall air conditioners or other IR-controlleddevices, which may be controlled by IR signals provided by the smarthazard detectors 104 or the smart wall switches 108.

In some implementations, the smart home environment 100 includes one ormore network-connected cameras 118 that are configured to provide videomonitoring and security in the smart home environment 100. In someimplementations, cameras 118 also capture video when other conditions orhazards are detected, in order to provide visual monitoring of the smarthome environment 100 when those conditions or hazards occur. The cameras118 may be used to determine occupancy of the structure 150 and/orparticular rooms 152 in the structure 150, and thus may act as occupancysensors. For example, video captured by the cameras 118 may be processedto identify the presence of an occupant in the structure 150 (e.g., in aparticular room 152). Specific individuals may be identified based, forexample, on their appearance (e.g., height, face) and/or movement (e.g.,their walk/gait). For example, cameras 118 may additionally include oneor more sensors (e.g., IR sensors, motion detectors), input devices(e.g., microphone for capturing audio), and output devices (e.g.,speaker for outputting audio).

The smart home environment 100 may additionally or alternatively includeone or more other occupancy sensors (e.g., the smart doorbell 106, smartdoor locks 120, touch screens, IR sensors, microphones, ambient lightsensors, motion detectors, smart nightlights 170, etc.). In someimplementations, the smart home environment 100 includes radio-frequencyidentification (RFID) readers (e.g., in each room 152 or a portionthereof) that determine occupancy based on RFID tags located on orembedded in occupants. For example, RFID readers may be integrated intothe smart hazard detectors 104.

The smart home environment 100 may include one or more sound and/orvibration sensors for detecting abnormal sounds and/or vibrations. Thesesensors may be integrated with any of the devices described above. Thesound sensors detect sound above a decibel threshold. The vibrationsensors detect vibration above a threshold directed at a particular area(e.g., vibration on a particular window when a force is applied to breakthe window).

Conditions detected by the devices described above (e.g., motion, sound,vibrations, hazards) may be referred to collectively as alert events.

The smart home environment 100 may also include communication withdevices outside of the physical home but within a proximate geographicalrange of the home. For example, the smart home environment 100 mayinclude a pool heater monitor 114 that communicates a current pooltemperature to other devices within the smart home environment 100and/or receives commands for controlling the pool temperature.Similarly, the smart home environment 100 may include an irrigationmonitor 116 that communicates information regarding irrigation systemswithin the smart home environment 100 and/or receives controlinformation for controlling such irrigation systems.

By virtue of network connectivity, one or more of the smart home devicesof FIG. 1 may further allow a user to interact with the device even ifthe user is not proximate to the device. For example, a user maycommunicate with a device using a computer (e.g., a desktop computer,laptop computer, or tablet) or other portable electronic device 166(e.g., a mobile phone, such as a smart phone). A webpage or applicationmay be configured to receive communications from the user and controlthe device based on the communications and/or to present informationabout the device's operation to the user. For example, the user may viewa current set point temperature for a device (e.g., a stove) and adjustit using a computer. The user may be in the structure during this remotecommunication or outside the structure.

As discussed above, users may control smart devices in the smart homeenvironment 100 using a network-connected computer or portableelectronic device 166. In some examples, some or all of the occupants(e.g., individuals who live in the home) may register their device 166with the smart home environment 100. Such registration may be made at acentral server to authenticate the occupant and/or the device as beingassociated with the home and to give permission to the occupant to usethe device to control the smart devices in the home. An occupant may usetheir registered device 166 to remotely control the smart devices of thehome, such as when the occupant is at work or on vacation. The occupantmay also use their registered device to control the smart devices whenthe occupant is actually located inside the home, such as when theoccupant is sitting on a couch inside the home. It should be appreciatedthat instead of or in addition to registering devices 166, the smarthome environment 100 may make inferences about which individuals live inthe home and are therefore occupants and which devices 166 areassociated with those individuals. As such, the smart home environmentmay “learn” who is an occupant and permit the devices 166 associatedwith those individuals to control the smart devices of the home.

In some implementations, in addition to containing processing andsensing capabilities, devices 102, 104, 106, 108, 110, 112, 114, 116,118, 120, and/or 122 (collectively referred to as “the smart devices”)are capable of data communications and information sharing with othersmart devices, a central server or cloud-computing system, and/or otherdevices that are network-connected. Data communications may be carriedout using any of a variety of custom or standard wireless protocols(e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread, Z-Wave, BluetoothSmart, ISA100.11a, WirelessHART, MiWi, etc.) and/or any of a variety ofcustom or standard wired protocols (e.g., Ethernet, HomePlug, etc.), orany other suitable communication protocol, including communicationprotocols not yet developed as of the filing date of this document.

In some implementations, the smart devices serve as wireless or wiredrepeaters. In some implementations, a first one of the smart devicescommunicates with a second one of the smart devices via a wirelessrouter. The smart devices may further communicate with each other via aconnection (e.g., network interface 160) to a network, such as theInternet 162. Through the Internet 162, the smart devices maycommunicate with a smart home provider server system 164 (also called acentral server system and/or a cloud-computing system herein). The smarthome provider server system 164 may be associated with a manufacturer,support entity, or service provider associated with the smart device(s).In some implementations, a user is able to contact customer supportusing a smart device itself rather than needing to use othercommunication means, such as a telephone or Internet-connected computer.In some implementations, software updates are automatically sent fromthe smart home provider server system 164 to smart devices (e.g., whenavailable, when purchased, or at routine intervals).

In some implementations, the network interface 160 includes aconventional network device (e.g., a router), and the smart homeenvironment 100 of FIG. 1 includes a hub device 180 that iscommunicatively coupled to the network(s) 162 directly or via thenetwork interface 160. The hub device 180 is further communicativelycoupled to one or more of the above intelligent, multi-sensing,network-connected devices (e.g., smart devices of the smart homeenvironment 100). Each of these smart devices optionally communicateswith the hub device 180 using one or more radio communication networksavailable at least in the smart home environment 100 (e.g., ZigBee,Z-Wave, Insteon, Bluetooth, Wi-Fi and other radio communicationnetworks). In some implementations, the hub device 180 and devicescoupled with/to the hub device can be controlled and/or interacted withvia an application running on a smart phone, household controller,laptop, tablet computer, game console or similar electronic device. Insome implementations, a user of such controller application can viewstatus of the hub device or coupled smart devices, configure the hubdevice to interoperate with smart devices newly introduced to the homenetwork, commission new smart devices, and adjust or view settings ofconnected smart devices, etc. In some implementations the hub deviceextends capabilities of low capability smart device to matchcapabilities of the highly capable smart devices of the same type,integrates functionality of multiple different device types—even acrossdifferent communication protocols, and is configured to streamlineadding of new devices and commissioning of the hub device.

FIG. 2 is a block diagram illustrating an example network architecture200 that includes a smart home network 202 in accordance with someimplementations. In some implementations, the smart devices 204 in thesmart home environment 100 (e.g., devices 102, 104, 106, 108, 110, 112,114, 116, 118, 120, and/or 122) combine with the hub device 180 tocreate a mesh network in smart home network 202. In someimplementations, one or more smart devices 204 in the smart home network202 operate as a smart home controller. Additionally and/oralternatively, hub device 180 operates as the smart home controller. Insome implementations, a smart home controller has more computing powerthan other smart devices. In some implementations, a smart homecontroller processes inputs (e.g., from smart devices 204, electronicdevice 166, and/or smart home provider server system 164) and sendscommands (e.g., to smart devices 204 in the smart home network 202) tocontrol operation of the smart home environment 100. In someimplementations, some of the smart devices 204 in the smart home network202 (e.g., in the mesh network) are “spokesman” nodes (e.g., 204-1) andothers are “low-powered” nodes (e.g., 204-9). Some of the smart devicesin the smart home environment 100 are battery powered, while others havea regular and reliable power source, such as by connecting to wiring(e.g., to 120V line voltage wires) behind the walls 154 of the smarthome environment. The smart devices that have a regular and reliablepower source are referred to as “spokesman” nodes. These nodes aretypically equipped with the capability of using a wireless protocol tofacilitate bidirectional communication with a variety of other devicesin the smart home environment 100, as well as with the smart homeprovider server system 164. In some implementations, one or more“spokesman” nodes operate as a smart home controller. On the other hand,the devices that are battery powered are the “low-power” nodes. Thesenodes tend to be smaller than spokesman nodes and typically onlycommunicate using wireless protocols that require very little power,such as Zigbee, 6LoWPAN, etc.

In some implementations, some low-power nodes are incapable ofbidirectional communication. These low-power nodes send messages, butthey are unable to “listen”. Thus, other devices in the smart homeenvironment 100, such as the spokesman nodes, cannot send information tothese low-power nodes.

In some implementations, some low-power nodes are capable of only alimited bidirectional communication. For example, other devices are ableto communicate with the low-power nodes only during a certain timeperiod.

As described, in some implementations, the smart devices serve aslow-power and spokesman nodes to create a mesh network in the smart homeenvironment 100. In some implementations, individual low-power nodes inthe smart home environment regularly send out messages regarding whatthey are sensing, and the other low-powered nodes in the smart homeenvironment—in addition to sending out their own messages—forward themessages, thereby causing the messages to travel from node to node(i.e., device to device) throughout the smart home network 202. In someimplementations, the spokesman nodes in the smart home network 202,which are able to communicate using a relatively high-powercommunication protocol, such as IEEE 802.11, are able to switch to arelatively low-power communication protocol, such as IEEE 802.15.4, toreceive these messages, translate the messages to other communicationprotocols, and send the translated messages to other spokesman nodesand/or the smart home provider server system 164 (using, e.g., therelatively high-power communication protocol). Thus, the low-powerednodes using low-power communication protocols are able to send and/orreceive messages across the entire smart home network 202, as well asover the Internet 162 to the smart home provider server system 164. Insome implementations, the mesh network enables the smart home providerserver system 164 to regularly receive data from most or all of thesmart devices in the home, make inferences based on the data, facilitatestate synchronization across devices within and outside of the smarthome network 202, and send commands to one or more of the smart devicesto perform tasks in the smart home environment.

As described, the spokesman nodes and some of the low-powered nodes arecapable of “listening.” Accordingly, users, other devices, and/or thesmart home provider server system 164 may communicate control commandsto the low-powered nodes. For example, a user may use the electronicdevice 166 (e.g., a smart phone) to send commands over the Internet tothe smart home provider server system 164, which then relays thecommands to one or more spokesman nodes in the smart home network 202.The spokesman nodes may use a low-power protocol to communicate thecommands to the low-power nodes throughout the smart home network 202,as well as to other spokesman nodes that did not receive the commandsdirectly from the smart home provider server system 164.

In some implementations, a smart nightlight 170 (FIG. 1), which is anexample of a smart device 204, is a low-power node. In addition tohousing a light source, the smart nightlight 170 houses an occupancysensor, such as an ultrasonic or passive IR sensor, and an ambient lightsensor, such as a photo resistor or a single-pixel sensor that measureslight in the room. In some implementations, the smart nightlight 170 isconfigured to activate the light source when its ambient light sensordetects that the room is dark and when its occupancy sensor detects thatsomeone is in the room. In some implementations, the smart nightlight170 is simply configured to activate the light source when its ambientlight sensor detects that the room is dark. Further, in someimplementations, the smart nightlight 170 includes a low-power wirelesscommunication chip (e.g., a ZigBee chip) that regularly sends outmessages regarding the occupancy of the room and the amount of light inthe room, including instantaneous messages coincident with the occupancysensor detecting the presence of a person in the room. As mentionedabove, these messages may be sent wirelessly (e.g., using the meshnetwork) from node to node (i.e., smart device to smart device) withinthe smart home network 202 as well as over the Internet 162 to the smarthome provider server system 164.

Other examples of low-power nodes include battery-operated versions ofthe smart hazard detectors 104. These smart hazard detectors 104 areoften located in an area without access to constant and reliable powerand may include any number and type of sensors, such as smoke/fire/heatsensors (e.g., thermal radiation sensors), carbon monoxide/dioxidesensors, occupancy/motion sensors, ambient light sensors, ambienttemperature sensors, humidity sensors, and the like. Furthermore, smarthazard detectors 104 may send messages that correspond to each of therespective sensors to the other devices and/or the smart home providerserver system 164, such as by using the mesh network as described above.

Examples of spokesman nodes include smart doorbells 106, smartthermostats 102, smart wall switches 108, and smart wall plugs 110.These devices are often located near and connected to a reliable powersource, and therefore may include more power-consuming components, suchas one or more communication chips capable of bidirectionalcommunication in a variety of protocols.

In some implementations, the smart home environment 100 includes servicerobots 168 (FIG. 1) that are configured to carry out, in an autonomousmanner, any of a variety of household tasks.

As explained above with reference to FIG. 1, in some implementations,the smart home environment 100 of FIG. 1 includes a hub device 180 thatis communicatively coupled to the network(s) 162 directly or via thenetwork interface 160. The hub device 180 is further communicativelycoupled to one or more of the smart devices using a radio communicationnetwork that is available at least in the smart home environment 100.Communication protocols used by the radio communication network include,but are not limited to, ZigBee, Z-Wave, Insteon, EuOcean, Thread, OSIAN,Bluetooth Low Energy and the like. In some implementations, the hubdevice 180 not only converts the data received from each smart device tomeet the data format requirements of the network interface 160 or thenetwork(s) 162, but also converts information received from the networkinterface 160 or the network(s) 162 to meet the data format requirementsof the respective communication protocol associated with a targetedsmart device. In some implementations, in addition to data formatconversion, the hub device 180 further processes the data received fromthe smart devices or information received from the network interface 160or the network(s) 162 preliminary. For example, the hub device 180 canintegrate inputs from multiple sensors/connected devices (includingsensors/devices of the same and/or different types), perform higherlevel processing on those inputs—e.g., to assess the overall environmentand coordinate operation among the different sensors/devices—and/orprovide instructions to the different devices based on the collection ofinputs and programmed processing. It is also noted that in someimplementations, the network interface 160 and the hub device 180 areintegrated to one network device. Functionality described herein isrepresentative of particular implementations of smart devices, controlapplication(s) running on representative electronic device(s) (such as asmart phone), hub device(s) 180, and server(s) coupled to hub device(s)via the Internet or other Wide Area Network. All or a portion of thisfunctionality and associated operations can be performed by any elementsof the described system—for example, all or a portion of thefunctionality described herein as being performed by an implementationof the hub device can be performed, in different system implementations,in whole or in part on the server, one or more connected smart devicesand/or the control application, or different combinations thereof.

FIG. 3 illustrates a network-level view of an extensible devices andservices platform with which the smart home environment of FIG. 1 isintegrated, in accordance with some implementations. The extensibledevices and services platform 300 includes smart home provider serversystem 164. Each of the intelligent, network-connected devices describedwith reference to FIG. 1 (e.g., 102, 104, 106, 108, 110, 112, 114, 116and 118, identified simply as “devices” in FIGS. 2-4) may communicatewith the smart home provider server system 164. For example, aconnection to the Internet 162 may be established either directly (forexample, using 3G/4G connectivity to a wireless carrier), or through anetwork interface 160 (e.g., a router, switch, gateway, hub device, oran intelligent, dedicated whole-home controller node), or through anycombination thereof.

In some implementations, the devices and services platform 300communicates with and collects data from the smart devices of the smarthome environment 100. In addition, in some implementations, the devicesand services platform 300 communicates with and collects data from aplurality of smart home environments across the world. For example, thesmart home provider server system 164 collects home data 302 from thedevices of one or more smart home environments 100, where the devicesmay routinely transmit home data or may transmit home data in specificinstances (e.g., when a device queries the home data 302). Examplecollected home data 302 includes, without limitation, power consumptiondata, blackbody radiation data, occupancy data, HVAC settings and usagedata, carbon monoxide levels data, carbon dioxide levels data, volatileorganic compounds levels data, sleeping schedule data, cooking scheduledata, inside and outside temperature humidity data, televisionviewership data, inside and outside noise level data, pressure data,video data, etc.

In some implementations, the smart home provider server system 164provides one or more services 304 to smart homes and/or third parties.Example services 304 include, without limitation, software updates,customer support, sensor data collection/logging, remote access, remoteor distributed control, and/or use suggestions (e.g., based on collectedhome data 302) to improve performance, reduce utility cost, increasesafety, etc. In some implementations, data associated with the services304 is stored at the smart home provider server system 164, and thesmart home provider server system 164 retrieves and transmits the dataat appropriate times (e.g., at regular intervals, upon receiving arequest from a user, etc.).

In some implementations, the extensible devices and services platform300 includes a processing engine 306, which may be concentrated at asingle server or distributed among several different computing entitieswithout limitation. In some implementations, the processing engine 306includes engines configured to receive data from the devices of smarthome environments 100 (e.g., via the Internet 162 and/or a networkinterface 160), to index the data, to analyze the data and/or togenerate statistics based on the analysis or as part of the analysis. Insome implementations, the analyzed data is stored as derived home data308.

Results of the analysis or statistics may thereafter be transmitted backto the device that provided home data used to derive the results, toother devices, to a server providing a webpage to a user of the device,or to other non-smart device entities. In some implementations, usagestatistics, usage statistics relative to use of other devices, usagepatterns, and/or statistics summarizing sensor readings are generated bythe processing engine 306 and transmitted. The results or statistics maybe provided via the Internet 162. In this manner, the processing engine306 may be configured and programmed to derive a variety of usefulinformation from the home data 302. A single server may include one ormore processing engines.

The derived home data 308 may be used at different granularities for avariety of useful purposes, ranging from explicit programmed control ofthe devices on a per-home, per-neighborhood, or per-region basis (forexample, demand-response programs for electrical utilities), to thegeneration of inferential abstractions that may assist on a per-homebasis (for example, an inference may be drawn that the homeowner hasleft for vacation and so security detection equipment may be put onheightened sensitivity), to the generation of statistics and associatedinferential abstractions that may be used for government or charitablepurposes. For example, processing engine 306 may generate statisticsabout device usage across a population of devices and send thestatistics to device users, service providers or other entities (e.g.,entities that have requested the statistics and/or entities that haveprovided monetary compensation for the statistics).

In some implementations, to encourage innovation and research and toincrease products and services available to users, the devices andservices platform 300 exposes a range of application programminginterfaces (APIs) 310 to third parties, such as charities 314,governmental entities 316 (e.g., the Food and Drug Administration or theEnvironmental Protection Agency), academic institutions 318 (e.g.,university researchers), businesses 320 (e.g., providing devicewarranties or service to related equipment, targeting advertisementsbased on home data), utility companies 324, and other third parties. TheAPIs 310 are coupled to and permit third-party systems to communicatewith the smart home provider server system 164, including the services304, the processing engine 306, the home data 302, and the derived homedata 308. In some implementations, the APIs 310 allow applicationsexecuted by the third parties to initiate specific data processing tasksthat are executed by the smart home provider server system 164, as wellas to receive dynamic updates to the home data 302 and the derived homedata 308.

For example, third parties may develop programs and/or applications(e.g., web applications or mobile applications) that integrate with thesmart home provider server system 164 to provide services andinformation to users. Such programs and applications may be, forexample, designed to help users reduce energy consumption, topreemptively service faulty equipment, to prepare for high servicedemands, to track past service performance, etc., and/or to performother beneficial functions or tasks.

FIG. 4 illustrates an abstracted functional view 400 of the extensibledevices and services platform 300 of FIG. 3, with reference to aprocessing engine 306 as well as devices of the smart home environment,in accordance with some implementations. Even though devices situated insmart home environments will have a wide variety of different individualcapabilities and limitations, the devices may be thought of as sharingcommon characteristics in that each device is a data consumer 402 (DC),a data source 404 (DS), a services consumer 406 (SC), and a servicessource 408 (SS). Advantageously, in addition to providing controlinformation used by the devices to achieve their local and immediateobjectives, the extensible devices and services platform 300 may also beconfigured to use the large amount of data that is generated by thesedevices. In addition to enhancing or optimizing the actual operation ofthe devices themselves with respect to their immediate functions, theextensible devices and services platform 300 may be directed to“repurpose” that data in a variety of automated, extensible, flexible,and/or scalable ways to achieve a variety of useful objectives. Theseobjectives may be predefined or adaptively identified based on, e.g.,usage patterns, device efficiency, and/or user input (e.g., requestingspecific functionality).

FIG. 4 shows processing engine 306 as including a number of processingparadigms 410. In some implementations, processing engine 306 includes amanaged services paradigm 410 a that monitors and manages primary orsecondary device functions. The device functions may include ensuringproper operation of a device given user inputs, estimating that (e.g.,and responding to an instance in which) an intruder is or is attemptingto be in a dwelling, detecting a failure of equipment coupled to thedevice (e.g., a light bulb having burned out), implementing or otherwiseresponding to energy demand response events, providing a heat-sourcealert, and/or alerting a user of a current or predicted future event orcharacteristic. In some implementations, processing engine 306 includesan advertising/communication paradigm 410 b that estimatescharacteristics (e.g., demographic information), desires and/or productsof interest of a user based on device usage. Services, promotions,products or upgrades may then be offered or automatically provided tothe user. In some implementations, processing engine 306 includes asocial paradigm 410 c that uses information from a social network,provides information to a social network (for example, based on deviceusage), and/or processes data associated with user and/or deviceinteractions with the social network platform. For example, a user'sstatus as reported to their trusted contacts on the social network maybe updated to indicate when the user is home based on light detection,security system inactivation or device usage detectors. As anotherexample, a user may be able to share device-usage statistics with otherusers. In yet another example, a user may share HVAC settings thatresult in low power bills and other users may download the HVAC settingsto their smart thermostat 102 to reduce their power bills.

In some implementations, processing engine 306 includes achallenges/rules/compliance/rewards paradigm 410 d that informs a userof challenges, competitions, rules, compliance regulations and/orrewards and/or that uses operation data to determine whether a challengehas been met, a rule or regulation has been complied with and/or areward has been earned. The challenges, rules, and/or regulations mayrelate to efforts to conserve energy, to live safely (e.g., reducing theoccurrence of heat-source alerts) (e.g., reducing exposure to toxins orcarcinogens), to conserve money and/or equipment life, to improvehealth, etc. For example, one challenge may involve participants turningdown their thermostat by one degree for one week. Those participantsthat successfully complete the challenge are rewarded, such as withcoupons, virtual currency, status, etc. Regarding compliance, an exampleinvolves a rental-property owner making a rule that no renters arepermitted to access certain owner's rooms. The devices in the roomhaving occupancy sensors may send updates to the owner when the room isaccessed.

In some implementations, processing engine 306 integrates or otherwiseuses extrinsic information 412 from extrinsic sources to improve thefunctioning of one or more processing paradigms. Extrinsic information412 may be used to interpret data received from a device, to determine acharacteristic of the environment near the device (e.g., outside astructure that the device is enclosed in), to determine services orproducts available to the user, to identify a social network orsocial-network information, to determine contact information of entities(e.g., public-service entities such as an emergency-response team, thepolice or a hospital) near the device, to identify statistical orenvironmental conditions, trends or other information associated with ahome or neighborhood, and so forth.

FIG. 5A illustrates a representative operating environment 500 in whicha hub device server system 508 provides data processing for monitoringand facilitating review of alert events (e.g., motion events) in videostreams captured by video cameras 118. As shown in FIG. 5A, the hubdevice server system 508 receives video data from video sources 522(including cameras 118) located at various physical locations (e.g.,inside homes, restaurants, stores, streets, parking lots, and/or thesmart home environments 100 of FIG. 1). Each video source 522 may bebound to one or more user (e.g., reviewer) accounts, and the hub deviceserver system 508 provides video monitoring data for the video source522 to client devices 504 associated with the reviewer accounts. Forexample, the portable electronic device 166 is an example of a clientdevice 504.

In some implementations, the smart home provider server system 164 or acomponent thereof serves as the hub device server system 508; the hubdevice server system 508 is a part or component of the smart homeprovider server system 164. In some implementations, the hub deviceserver system 508 is a dedicated video processing server that providesvideo processing services to video sources and client devices 504independent of other services provided by the hub device server system508. An example of a video processing server is described below withreference to FIG. 5B.

In some implementations, each of the video sources 522 includes one ormore video cameras 118 that capture video and send the captured video tothe hub device server system 508 substantially in real-time. In someimplementations, each of the video sources 522 optionally includes acontroller device (not shown) that serves as an intermediary between theone or more cameras 118 and the hub device server system 508. Thecontroller device receives the video data from the one or more cameras118, optionally performs some preliminary processing on the video data,and sends the video data to the hub device server system 508 on behalfof the one or more cameras 118 substantially in real-time. In someimplementations, each camera has its own on-board processingcapabilities to perform some preliminary processing on the capturedvideo data before sending the processed video data (along with metadataobtained through the preliminary processing) to the controller deviceand/or the hub device server system 508.

In some implementations, a camera 118 of a video source 522 capturesvideo at a first resolution (e.g., 720P and/or 1080P) and/or a firstframe rate (24 frames per second), and sends the captured video to thehub device server system 508 at both the first resolution (e.g., theoriginal capture resolution(s), the high-quality resolution(s) such as1080P and/or 720P) and the first frame rate, and at a second, differentresolution (e.g., 180P) and/or a second frame rate (e.g., 5 frames persecond or 10 frames per second). For example, the camera 118 captures avideo 523-1 at 720P and/or 1080P resolution (the camera 118 may capturea video at 1080P and create a downscaled 720P version, or capture atboth 720P and 1080P). The video source 522 creates a second (or third),rescaled (and optionally at a different frame rate than the version523-1) version 525-1 of the captured video at 180P resolution, andtransmits both the original captured version 523-1 (i.e., 1080P and/or720P) and the rescaled version 525-1 (i.e., the 180P version) to the hubdevice server system 508 for storage. In some implementations, therescaled version has a lower resolution, and optionally a lower framerate, than the original captured video. The hub device server system 508transmits the original captured version or the rescaled version to aclient 504, depending on the context. For example, the hub device serversystem 508 transmits the rescaled version when transmitting multiplevideos to the same client device 504 for concurrent monitoring by theuser, and transmits the original captured version in other contexts. Insome implementations, the hub device server system 508 downscales theoriginal captured version to a lower resolution, and transmits thedownscaled version.

In some implementations, a camera 118 of a video source 522 capturesvideo at a first resolution (e.g., 720P and/or 1080P) and/or a firstframe rate, and sends the captured video to the hub device server system508 at the first resolution (e.g., the original capture resolution(s);the high-quality resolution(s) such as 1080P and/or 720P) and firstframe rate for storage. When the hub device server system 508 transmitsthe video to a client device, the hub device server system 508 maydownscale the video to a second, lower resolution (e.g., 180P) and/orsecond, lower frame rate for the transmission, depending on the context.For example, the hub device server system 508 transmits the downscaledversion when transmitting multiple videos to the same client device 504for concurrent monitoring by the user, and transmits the originalcaptured version in other contexts.

As shown in FIG. 5A, in accordance with some implementations, each ofthe client devices 504 includes a client-side module 502. Theclient-side module 502 communicates with a server-side module 506executed on the hub device server system 508 through the one or morenetworks 162. The client-side module 502 provides client-sidefunctionalities for the event monitoring and review processing andcommunications with the server-side module 506. The server-side module506 provides server-side functionalities for event monitoring and reviewprocessing for any number of client-side modules 502 each residing on arespective client device 504. The server-side module 506 also providesserver-side functionalities for video processing and camera control forany number of the video sources 522, including any number of controldevices and the cameras 118.

In some implementations, the server-side module 506 includes one or moreprocessors 512, a video storage database 514, device and accountdatabases 516, an I/O interface to one or more client devices 518, andan I/O interface to one or more video sources 520. The I/O interface toone or more clients 518 facilitates the client-facing input and outputprocessing for the server-side module 506. In some implementations, theI/O interface to clients 518 or a transcoding proxy computer (not shown)rescales (e.g., downscales) and/or changes the frame rate of video fortransmission to a client 504. The databases 516 store a plurality ofprofiles for reviewer accounts registered with the video processingserver, where a respective user profile includes account credentials fora respective reviewer account, and one or more video sources linked tothe respective reviewer account. The I/O interface to one or more videosources 520 facilitates communications with one or more video sources522 (e.g., groups of one or more cameras 118 and associated controllerdevices). The video storage database 514 stores raw video data receivedfrom the video sources 522, as well as various types of metadata, suchas motion events, event categories, event category models, eventfilters, and event masks, for use in data processing for eventmonitoring and review for each reviewer account. The video storagedatabase 514 also includes in some implementations a collection ofcurated and condensed video frames (e.g., extracted-frames video,described further below) covering hours or days of stored raw video tofacilitate fast, seamless user review/scrubbing using a client sidemodule 502 through key events/cuepoints that occurred in those hours anddays of stored video without needing to download to or review on aclient device 504 the raw video directly.

In some implementations, the server-side module 506 receives informationregarding alert events detected by other smart devices 204 (e.g.,hazards, sound, vibration, motion). In accordance with the alert eventinformation, the server-side module 506 instructs one or more videosources 522 in the smart home environment 100 where the alert event isdetected to capture video and/or associate with the alert event video,received from the video sources 522 in the same smart home environment100, that is contemporaneous or proximate in time with the alert event.

Examples of a representative client device 504 include, but are notlimited to, a handheld computer, a wearable computing device, a personaldigital assistant (PDA), a tablet computer, a laptop computer, a desktopcomputer, a cellular telephone, a smart phone, an enhanced generalpacket radio service (EGPRS) mobile phone, a media player, a navigationdevice, a game console, a television, a remote control, a point-of-sale(POS) terminal, vehicle-mounted computer, an ebook reader, or acombination of any two or more of these data processing devices or otherdata processing devices. For example, client devices 504-1, 504-2, and504-m are a smart phone, a tablet computer, and a laptop computer,respectively.

Examples of the one or more networks 162 include local area networks(LAN) and wide area networks (WAN) such as the Internet. The one or morenetworks 162 are, optionally, implemented using any known networkprotocol, including various wired or wireless protocols, such asEthernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution(LTE), Global System for Mobile Communications (GSM), Enhanced Data GSMEnvironment (EDGE), code division multiple access (CDMA), time divisionmultiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol(VoIP), Wi-MAX, or any other suitable communication protocol.

In some implementations, the hub device server system 508 is implementedon one or more standalone data processing apparatuses or a distributednetwork of computers. In some implementations, the hub device serversystem 508 also employs various virtual devices and/or services of thirdparty service providers (e.g., third-party cloud service providers) toprovide the underlying computing resources and/or infrastructureresources of the hub device server system 508. In some implementations,the hub device server system 508 includes, but is not limited to, ahandheld computer, a tablet computer, a laptop computer, a desktopcomputer, or a combination of any two or more of these data processingdevices or other data processing devices.

The server-client environment 500 shown in FIG. 5A includes both aclient-side portion (e.g., the client-side module 502) and a server-sideportion (e.g., the server-side module 506). The division offunctionalities between the client and server portions of operatingenvironment 500 can vary in different implementations. Similarly, thedivision of functionalities between the video source 522 and the hubdevice server system 508 can vary in different implementations. Forexample, in some implementations, client-side module 502 is athin-client that provides only user-facing input and output processingfunctions, and delegates all other data processing functionalities to abackend server (e.g., the hub device server system 508). Similarly, insome implementations, a respective one of the video sources 522 is asimple video capturing device that continuously captures and streamsvideo data to the hub device server system 508 with no or limited localpreliminary processing on the video data. Although many aspects of thepresent technology are described from the perspective of the hub deviceserver system 508, the corresponding actions performed by the clientdevice 504 and/or the video sources 522 would be apparent to onesskilled in the art without any creative efforts. Similarly, some aspectsof the present technology may be described from the perspective of theclient device or the video source, and the corresponding actionsperformed by the video server would be apparent to ones skilled in theart without any creative efforts. Furthermore, some aspects of thepresent technology may be performed by the hub device server system 508,the client device 504, and the video sources 522 cooperatively.

It should be understood that operating environment 500 that involves thehub device server system 508, the video sources 522 and the videocameras 118 is merely an example. Many aspects of operating environment500 are generally applicable in other operating environments in which aserver system provides data processing for monitoring and facilitatingreview of data captured by other types of electronic devices (e.g.,smart thermostats 102, smart hazard detectors 104, smart doorbells 106,smart wall plugs 110, appliances 112 and the like).

The electronic devices, the client devices or the server systemcommunicate with each other using the one or more communication networks162. In an example smart home environment, two or more devices (e.g.,the network interface device 160, the hub device 180, and the clientdevices 504-m) are located in close proximity to each other, such thatthey could be communicatively coupled in the same sub-network 162A viawired connections, a WLAN or a Bluetooth Personal Area Network (PAN).The Bluetooth PAN is optionally established based on classical Bluetoothtechnology or Bluetooth Low Energy (BLE) technology. This smart homeenvironment further includes one or more other radio communicationnetworks 162B through which at least some of the electronic devices ofthe video sources 522-n exchange data with the hub device 180.Alternatively, in some situations, some of the electronic devices of thevideo sources 522-n communicate with the network interface device 160directly via the same sub-network 162A that couples devices 160, 180 and504-m. In some implementations (e.g., in the network 162C), both theclient device 504-m and the electronic devices of the video sources522-n communicate directly via the network(s) 162 without passing thenetwork interface device 160 or the hub device 180.

In some implementations, during normal operation, the network interfacedevice 160 and the hub device 180 communicate with each other to form anetwork gateway through which data are exchanged with the electronicdevice of the video sources 522-n. As explained above, the networkinterface device 160 and the hub device 180 optionally communicate witheach other via a sub-network 162A.

In some implementations, the hub device 180 is omitted, and thefunctionality of the hub device 180 is performed by the hub deviceserver system 508, video server system 552, or smart home providerserver system 164.

In some implementations, the hub device server system 508 is, orincludes, a dedicated video processing server. FIG. 5B illustrates arepresentative operating environment 550 in which a video server system552 serves as a dedicated video processing server and provides dataprocessing for monitoring and facilitating review of alert events (e.g.,motion events) in video streams captured by video cameras 118. As shownin FIG. 5B, the video server system 552 receives video data from videosources 522 (including cameras 118) located at various physicallocations (e.g., inside homes, restaurants, stores, streets, parkinglots, and/or the smart home environments 100 of FIG. 1). Each videosource 522 may be bound to one or more user (e.g., reviewer) accounts,and the video server system 552 provides video monitoring data for thevideo source 522 to client devices 504 associated with the revieweraccounts. For example, the portable electronic device 166 is an exampleof the client device 504.

In some implementations, the smart home provider server system 164 or acomponent thereof serves as the video server system 552; the videoserver system 552 is a part or component of the smart home providerserver system 164. In some implementations, the video server system 552is separate from the smart home provider server system 164, and providesvideo processing services to video sources 522 and client devices 504independent of other services provided by the smart home provider serversystem 164. In some implementations, the smart home provider serversystem 164 and the video server system 552 are separate but communicateinformation with each other to provide functionality to users. Forexample, a detection of a hazard may be communicated by the smart homeprovider server system 164 to the video server system 552, and the videoserver system 552, in accordance with the communication regarding thedetection of the hazard, records, processes, and/or provides videoassociated with the detected hazard.

In some implementations, each of the video sources 522 includes one ormore video cameras 118 that capture video and send the captured video tothe video server system 552 substantially in real-time. In someimplementations, each of the video sources 522 optionally includes acontroller device (not shown) that serves as an intermediary between theone or more cameras 118 and the video server system 552. The controllerdevice receives the video data from the one or more cameras 118,optionally, performs some preliminary processing on the video data, andsends the video data to the video server system 552 on behalf of the oneor more cameras 118 substantially in real-time. In some implementations,each camera has its own on-board processing capabilities to perform somepreliminary processing on the captured video data before sending theprocessed video data (along with metadata obtained through thepreliminary processing) to the controller device and/or the video serversystem 552.

In some implementations, a camera 118 of a video source 522 capturesvideo at a first resolution (e.g., 720P and/or 1080P) and/or a firstframe rate (24 frames per second), and sends the captured video to thevideo server system 552 at both the first resolution (e.g., the originalcapture resolution(s), the high-quality resolution(s)) and the firstframe rate, and a second, different resolution (e.g., 180P) and/or asecond frame rate (e.g., 5 frames per second or 10 frames per second).For example, the camera 118 captures a video 523-1 at 720P and/or 1080Presolution (the camera 118 may capture a video at 1080P and create adownscaled 720P version, or capture at both 720P and 1080P). The videosource 522 creates a second (or third), rescaled (and optionally at adifferent frame rate than the version 523-1) version 525-1 of thecaptured video at 180P resolution, and transmits both the originalcaptured version 523-1 (i.e., 1080P and/or 720P) and the rescaledversion 525-1 (i.e., the 180P version) to the video server system 552for storage. In some implementations, the rescaled version has a lowerresolution, and optionally a lower frame rate, than the originalcaptured video. The video server system 552 transmits the originalcaptured version or the rescaled version to a client 504, depending onthe context. For example, the video server system 552 transmits therescaled version when transmitting multiple videos to the same clientdevice 504 for concurrent monitoring by the user, and transmits theoriginal captured version in other contexts. In some implementations,the video server system 552 downscales the original captured version toa lower resolution, and transmits the downscaled version.

In some implementations, a camera 118 of a video source 522 capturesvideo at a first resolution (e.g., 720P and/or 1080P)) and/or a firstframe rate, and sends the captured video to the video server system 552at the first resolution (e.g., the original capture resolution(s), thehigh-quality resolution(s) such as 1080P and/or 720P) and the first famerate for storage. When the video server system 552 transmits the videoto a client device, the video server system 552 may downscale the videoto a second, lower resolution (e.g., 180P) and/or second, lower framerate for the transmission, depending on the context. For example, thevideo server system 552 transmits the downscaled version whentransmitting multiple videos to the same client device 504 forconcurrent monitoring by the user, and transmits the original capturedversion in other contexts.

As shown in FIG. 5B, in accordance with some implementations, each ofthe client devices 504 includes a client-side module 502. Theclient-side module 502 communicates with the video server system 552through the one or more networks 162. In some implementations, the videoserver system 552 includes a video server 552, a client interface server556, and a camera interface server 558. In some implementations, thevideo server 552 includes the server-side module 506 and its componentsand modules (FIG. 5A) or one or more respective components and/ormodules of the server-side module 506. The client-side module 502provides client-side functionalities for the event monitoring and reviewprocessing and communications with the video server system 552. Thevideo server system 552 provides server-side functionalities for eventmonitoring and review processing for any number of client-side modules502 each residing on a respective client device 504. The video serversystem 556 also provides server-side functionalities for videoprocessing and camera control for any number of the video sources 522,including any number of control devices and the cameras 118.

In some implementations, the video server 554 includes one or moreprocessors 512, a video storage database 514, and device and accountdatabases 516. In some implementations, the video server system 552 alsoincludes a client interface server 556 and a camera interface server558. The client interface server 556 provides an I/O interface to one ormore client devices 504, and the camera interface server 558 provides anI/O interface to one or more video sources 520. The client interfaceserver 556 facilitates the client-facing input and output processing forthe video server system 552. For example, the client interface server556 generates web pages for reviewing and monitoring video captured bythe video sources 522 in a web browser application at a client 504. Insome implementations, the client interface server 556 or a transcodingproxy computer rescales (e.g., downscales) and/or changes the frame rateof video for transmission to a client 504. In some implementations, theclient interface server 504 also serves as the transcoding proxy. Thedatabases 516 store a plurality of profiles for reviewer accountsregistered with the video processing server, where a respective userprofile includes account credentials for a respective reviewer account,and one or more video sources linked to the respective reviewer account.The camera interface server 558 facilitates communications with one ormore video sources 522 (e.g., groups of one or more cameras 118 andassociated controller devices). The video storage database 514 storesraw video data received from the video sources 522, as well as varioustypes of metadata, such as motion events, event categories, eventcategory models, event filters, event masks, alert events, and camerahistories, for use in data processing for event monitoring and reviewfor each reviewer account.

In some implementations, the video server system 552 receivesinformation regarding alert events detected by other smart devices 204(e.g., hazards, sound, vibration, motion. In accordance with the alertevent information, the video server system 552 instructs one or morevideo sources 522 in the smart home environment 100 where the alertevent is detected to capture video and/or associate with the alert eventvideo, received from the video sources 522 in the same smart homeenvironment 100, that is contemporaneous or proximate in time with thealert event.

Examples of a representative client device 504 include, but are notlimited to, a handheld computer, a wearable computing device, a personaldigital assistant (PDA), a tablet computer, a laptop computer, a desktopcomputer, a cellular telephone, a smart phone, an enhanced generalpacket radio service (EGPRS) mobile phone, a media player, a navigationdevice, a game console, a television, a remote control, a point-of-sale(POS) terminal, vehicle-mounted computer, an ebook reader, or acombination of any two or more of these data processing devices or otherdata processing devices. For example, client devices 504-1, 504-2, and504-m are a smart phone, a tablet computer, and a laptop computer,respectively.

Examples of the one or more networks 162 include local area networks(LAN) and wide area networks (WAN) such as the Internet. The one or morenetworks 162 are, optionally, implemented using any known networkprotocol, including various wired or wireless protocols, such asEthernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution(LTE), Global System for Mobile Communications (GSM), Enhanced Data GSMEnvironment (EDGE), code division multiple access (CDMA), time divisionmultiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol(VoIP), Wi-MAX, or any other suitable communication protocol.

In some implementations, the video server system 552 is implemented onone or more standalone data processing apparatuses or a distributednetwork of computers. In some implementations, the video server 554, theclient interface server 556, and the camera interface server 558 areeach respectively implemented on one or more standalone data processingapparatuses or a distributed network of computers. In someimplementations, the video server system 552 also employs variousvirtual devices and/or services of third party service providers (e.g.,third-party cloud service providers) to provide the underlying computingresources and/or infrastructure resources of the video server system552. In some implementations, the video server system 552 includes, butis not limited to, a handheld computer, a tablet computer, a laptopcomputer, a desktop computer, or a combination of any two or more ofthese data processing devices or other data processing devices.

The server-client environment 550 shown in FIG. 5B includes both aclient-side portion (e.g., the client-side module 502) and a server-sideportion (e.g., the components and modules in the video server system552). The division of functionalities between the client and serverportions of operating environment 550 can vary in differentimplementations. Similarly, the division of functionalities between thevideo source 522 and the video server system 552 can vary in differentimplementations. For example, in some implementations, client-sidemodule 502 is a thin-client that provides only user-facing input andoutput processing functions, and delegates all other data processingfunctionalities to a backend server (e.g., the video server system 552).Similarly, in some implementations, a respective one of the videosources 522 is a simple video capturing device that continuouslycaptures and streams video data to the video server system 552 with noor limited local preliminary processing on the video data. Although manyaspects of the present technology are described from the perspective ofthe video server system 552, the corresponding actions performed by theclient device 504 and/or the video sources 522 would be apparent to onesskilled in the art without any creative efforts. Similarly, some aspectsof the present technology may be described from the perspective of theclient device or the video source, and the corresponding actionsperformed by the video server would be apparent to ones skilled in theart without any creative efforts. Furthermore, some aspects of thepresent technology may be performed by the video server system 552, theclient device 504, and the video sources 522 cooperatively.

It should be understood that operating environment 550 that involves thevideo server system 552, the video sources 522 and the video cameras 118is merely an example. Many aspects of operating environment 550 aregenerally applicable in other operating environments in which a serversystem provides data processing for monitoring and facilitating reviewof data captured by other types of electronic devices (e.g., smartthermostats 102, smart hazard detectors 104, smart doorbells 106, smartwall plugs 110, appliances 112 and the like).

The electronic devices, the client devices or the server systemcommunicate with each other using the one or more communication networks162. In an example smart home environment, two or more devices (e.g.,the network interface device 160, the hub device 180, and the clientdevices 504-m) are located in close proximity to each other, such thatthey could be communicatively coupled in the same sub-network 162A viawired connections, a WLAN or a Bluetooth Personal Area Network (PAN).The Bluetooth PAN is optionally established based on classical Bluetoothtechnology or Bluetooth Low Energy (BLE) technology. This smart homeenvironment further includes one or more other radio communicationnetworks 162B through which at least some of the electronic devices ofthe video sources 522-n exchange data with the hub device 180.Alternatively, in some situations, some of the electronic devices of thevideo sources 522-n communicate with the network interface device 160directly via the same sub-network 162A that couples devices 160, 180 and504-m. In some implementations (e.g., in the network 162C), both theclient device 504-m and the electronic devices of the video sources522-n communicate directly via the network(s) 162 without passing thenetwork interface device 160 or the hub device 180.

In some implementations, during normal operation, the network interfacedevice 160 and the hub device 180 communicate with each other to form anetwork gateway through which data are exchanged with the electronicdevice of the video sources 522-n. As explained above, the networkinterface device 160 and the hub device 180 optionally communicate witheach other via a sub-network 162A.

In some implementations, a video source 522 may be private (e.g., itscaptured videos and history are accessible only to the associateduser/account), public (e.g., its captured videos and history areaccessible by anyone), or shared (e.g., its captured videos and historyare accessible only to the associated user/account and other specificusers/accounts with whom the associated user has authorized access(e.g., by sharing with the other specific users)). Whether a videosource 522 is private, public, or shared is configurable by theassociated user.

In some implementations, the camera 118 also performs preliminary motiondetection on video captured by the camera 118. For example, the camera118 analyzes the captured video for significant changes in pixels. Whenmotion is detected by the preliminary motion detection, the camera 118transmits information to the hub device server system 508 or videoserver system 552 informing the server system of the preliminarydetected motion. The hub device server system 508 or video server system552, in accordance with the information of the detected motion, mayactivate sending of a motion detection notification to a client device504, log the preliminary detected motion as an alert event, and/orperform additional analysis of the captured video to confirm and/orclassify the preliminary detected motion.

FIG. 6 is a block diagram illustrating a representative hub device 180in accordance with some implementations. In some implementations, thehub device 180 includes one or more processing units (e.g., CPUs, ASICs,FPGAs, microprocessors, and the like) 602, one or more communicationinterfaces 604, memory 606, radios 640, and one or more communicationbuses 608 for interconnecting these components (sometimes called achipset). In some implementations, the hub device 180 includes one ormore input devices 610 such as one or more buttons for receiving input.In some implementations, the hub device 180 includes one or more outputdevices 612 such as one or more indicator lights, a sound card, aspeaker, a small display for displaying textual information and errorcodes, etc. Furthermore, in some implementations, the hub device 180uses a microphone and voice recognition or a camera and gesturerecognition to supplement or replace the keyboard. In someimplementations, the hub device 180 includes a location detection device614, such as a GPS (global positioning satellite) or other geo-locationreceiver, for determining the location of the hub device 180.

The hub device 180 optionally includes one or more built-in sensors (notshown), including, for example, one or more thermal radiation sensors,ambient temperature sensors, humidity sensors, IR sensors, occupancysensors (e.g., using RFID sensors), ambient light sensors, motiondetectors, accelerometers, and/or gyroscopes.

The radios 640 enables one or more radio communication networks in thesmart home environments, and allows a hub device to communicate withsmart devices. In some implementations, the radios 640 are capable ofdata communications using any of a variety of custom or standardwireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread,Z-Wave, Bluetooth Smart, ISA100.11a, WirelessHART, MiWi, etc.) custom orstandard wired protocols (e.g., Ethernet, HomePlug, etc.), and/or anyother suitable communication protocol, including communication protocolsnot yet developed as of the filing date of this document.

Communication interfaces 604 include, for example, hardware capable ofdata communications using any of a variety of custom or standardwireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread,Z-Wave, Bluetooth Smart, ISA100.11a, WirelessHART, MiWi, etc.) and/orany of a variety of custom or standard wired protocols (e.g., Ethernet,HomePlug, etc.), or any other suitable communication protocol, includingcommunication protocols not yet developed as of the filing date of thisdocument.

Memory 606 includes high-speed random access memory, such as DRAM, SRAM,DDR RAM, or other random access solid state memory devices; and,optionally, includes non-volatile memory, such as one or more magneticdisk storage devices, one or more optical disk storage devices, one ormore flash memory devices, or one or more other non-volatile solid statestorage devices. Memory 606, or alternatively the non-volatile memorywithin memory 606, includes a non-transitory computer readable storagemedium. In some implementations, memory 606, or the non-transitorycomputer readable storage medium of memory 606, stores the followingprograms, modules, and data structures, or a subset or superset thereof:

-   -   Operating logic 616 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Hub device communication module 618 for connecting to and        communicating with other network devices (e.g., network        interface 160, such as a router that provides Internet        connectivity, networked storage devices, network routing        devices, server system 508, etc.) connected to one or more        networks 162 via one or more communication interfaces 604 (wired        or wireless);    -   Radio Communication Module 620 for connecting the hub device 180        to other devices (e.g., controller devices, smart devices 204 in        smart home environment 100, client devices 504) via one or more        radio communication devices (e.g., radios 640);    -   User interface module 622 for providing and displaying a user        interface in which settings, captured data, and/or other data        for one or more devices (e.g., smart devices 204 in smart home        environment 100) can be configured and/or viewed; and    -   Hub device database 624, including but not limited to:        -   Sensor information 6240 for storing and managing data            received, detected, and/or transmitted by one or more            sensors of the hub device 180 and/or one or more other            devices (e.g., smart devices 204 in smart home environment            100);        -   Device settings 6242 for storing operational settings for            one or more devices (e.g., coupled smart devices 204 in            smart home environment 100); and        -   Communication protocol information 6244 for storing and            managing protocol information for one or more protocols            (e.g., standard wireless protocols, such as ZigBee, Z-Wave,            etc., and/or custom or standard wired protocols, such as            Ethernet).

Each of the above identified elements (e.g., modules stored in memory206 of hub device 180) may be stored in one or more of the previouslymentioned memory devices (e.g., the memory of any of the smart devicesin smart home environment 100, FIG. 1), and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory606, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 606, optionally, stores additionalmodules and data structures not described above.

FIG. 7A is a block diagram illustrating the hub device server system 508in accordance with some implementations. The hub device server system508, typically, includes one or more processing units (CPUs) 702, one ormore network interfaces 704 (e.g., including an I/O interface to one ormore client devices and an I/O interface to one or more electronicdevices), memory 706, and one or more communication buses 708 forinterconnecting these components (sometimes called a chipset). Memory706 includes high-speed random access memory, such as DRAM, SRAM, DDRRAM, or other random access solid state memory devices; and, optionally,includes non-volatile memory, such as one or more magnetic disk storagedevices, one or more optical disk storage devices, one or more flashmemory devices, or one or more other non-volatile solid state storagedevices. Memory 706, optionally, includes one or more storage devicesremotely located from one or more processing units 702. Memory 706, oralternatively the non-volatile memory within memory 706, includes anon-transitory computer readable storage medium. In someimplementations, memory 706, or the non-transitory computer readablestorage medium of memory 706, stores the following programs, modules,and data structures, or a subset or superset thereof:

-   -   Operating system 710 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 712 for connecting the hub device        server system 508 to other systems and devices (e.g., client        devices, electronic devices, and systems connected to one or        more networks 162, FIGS. 1-5B) via one or more network        interfaces 704 (wired or wireless);    -   Server-side module 714, which provides server-side        functionalities for device control, data processing and data        review, including but not limited to:        -   Data receiving module 7140 for receiving data from            electronic devices (e.g., video data from a camera 118,            FIG. 1) via the hub device 180, and preparing the received            data for further processing and storage in the data storage            database 7160;        -   Hub and device control module 7142 for generating and            sending server-initiated control commands to modify            operation modes of electronic devices (e.g., devices of a            smart home environment 100), and/or receiving (e.g., from            client devices 504) and forwarding user-initiated control            commands to modify operation modes of the electronic            devices;        -   Data processing module 7144 for processing the data provided            by the electronic devices, and/or preparing and sending            processed data to a device for review (e.g., client devices            504 for review by a user); and    -   Server database 716, including but not limited to:        -   Data storage database 7160 for storing data associated with            each electronic device (e.g., each camera) of each user            account, as well as data processing models, processed data            results, and other relevant metadata (e.g., names of data            results, location of electronic device, creation time,            duration, settings of the electronic device, etc.)            associated with the data, wherein (optionally) all or a            portion of the data and/or processing associated with the            hub device 180 or smart devices are stored securely;        -   Account database 7162 for storing account information for            user accounts, including user account information,            information and settings for linked hub devices and            electronic devices (e.g., hub device identifications), hub            device specific secrets, relevant user and hardware            characteristics (e.g., service tier, device model, storage            capacity, processing capabilities, etc.), user interface            settings, data review preferences, etc., where the            information for associated electronic devices includes, but            is not limited to, one or more device identifiers (e.g., MAC            address and UUID), device specific secrets, and displayed            titles; and        -   Device Information Database 7164 for storing device            information related to one or more hub devices, e.g., device            identifiers and hub device specific secrets, independently            of whether the corresponding hub devices have been            associated with any user account.

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory706, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 706, optionally, stores additionalmodules and data structures not described above.

FIGS. 7B-7C are block diagrams illustrating the video server 554 inaccordance with some implementations. The video server 554, typically,includes one or more processing units (CPUs) 718, one or more networkinterfaces 720, memory 722, and one or more communication buses 724 forinterconnecting these components (sometimes called a chipset). Memory722 includes high-speed random access memory, such as DRAM, SRAM, DDRRAM, or other random access solid state memory devices; and, optionally,includes non-volatile memory, such as one or more magnetic disk storagedevices, one or more optical disk storage devices, one or more flashmemory devices, or one or more other non-volatile solid state storagedevices. Memory 722, optionally, includes one or more storage devicesremotely located from one or more processing units 718. Memory 722, oralternatively the non-volatile memory within memory 722, includes anon-transitory computer readable storage medium. In someimplementations, memory 722, or the non-transitory computer readablestorage medium of memory 722, stores the following programs, modules,and data structures, or a subset or superset thereof:

-   -   Operating system 726 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 728 for connecting the video server        554 to other systems and devices (e.g., client devices,        electronic devices, and systems connected to one or more        networks 162, FIGS. 1-5B) via one or more network interfaces 720        (wired or wireless);    -   Video server module 730, which provides server-side data        processing and functionalities for video and event monitoring        and review, including but not limited to:        -   Account administration module 7300 for creating reviewer            accounts, performing camera registration processing to            establish associations between video sources to their            respective reviewer accounts, and providing account            login-services to the client devices 504;        -   Video data receiving module 7302 for receiving raw video            data from the video sources 522, and preparing the received            video data for event processing and long-term storage in the            video storage database 514;        -   Camera control module 7304 for generating and sending            server-initiated control commands to modify the operation            modes of the video sources, and/or receiving and forwarding            user-initiated control commands to modify the operation            modes of the video sources 522;        -   Event detection module 7306 for detecting motion event            candidates in video streams from each of the video sources            522, including motion track identification, false positive            suppression, and event mask generation and caching;        -   Event categorization module 7308 for categorizing motion            events detected in received video streams;        -   Zone creation module 73010 for generating zones of interest            in accordance with user input;        -   Person identification module 73012 for identifying            characteristics associated with presence of humans in the            received video streams;        -   Filter application module 73014 for selecting event filters            (e.g., event categories, zones of interest, a human filter,            etc.) and applying the selected event filter to past and new            motion events detected in the video streams;        -   Zone monitoring module 73016 for monitoring motions within            selected zones of interest and generating notifications for            new motion events detected within the selected zones of            interest, where the zone monitoring takes into account            changes in surrounding context of the zones and is not            confined within the selected zones of interest;        -   Real-time motion event presentation module 73018 for            dynamically changing characteristics of event indicators            displayed in user interfaces as new event filters, such as            new event categories or new zones of interest, are created,            and for providing real-time notifications as new motion            events are detected in the video streams; and        -   Event post-processing module 3020 for providing summary            time-lapse for past motion events detected in video streams,            and providing event and category editing functions to user            for revising past event categorization results;        -   Alert events module 73022 for receiving information on alert            events (e.g., detected hazards, detected sounds, etc.),            instructing cameras 118 to capture video in accordance with            alert event information, and determining chronologies of            alert events;        -   Camera events module 73024 for associating captured video            with alert events, from the same smart home environment 100,            that are proximate or contemporaneous in time, and logging            camera histories of camera events;        -   Frame extraction module 73026 for extracting frames from raw            video data from the video sources 522;        -   Encoding module 73028 for encoding extracted-frames video            using frames extracted by the frame extraction module 73026;        -   Thumbnails module 73030 for selecting frames for and            generating thumbnails for respective portions of video            corresponding to events or alerts;        -   Object detection module 73032 for detecting objects and            corresponding contours in video feeds;        -   Sources and sinks detection module 73034 for detecting            sources and sinks of motion activity in video feeds; and        -   Zone definition module 73036 for generating suggested zone            definitions for detected objects;    -   Server database 732, including but not limited to:        -   Video storage database 7320 storing raw video data            associated with each of the video sources 522 (each            including one or more cameras 118) of each reviewer account,            as well as event categorization models (e.g., event            clusters, categorization criteria, etc.), event            categorization results (e.g., recognized event categories,            and assignment of past motion events to the recognized event            categories, representative events for each recognized event            category, etc.), event masks for past motion events, video            segments for each past motion event, preview video (e.g.,            sprites) of past motion events, and other relevant metadata            (e.g., names of event categories, location of the cameras            118, creation time, duration, etc.) associated with the            motion events;        -   Account database 7324 for storing account information for            user accounts, including user account information,            information and settings for linked hub devices and            electronic devices (e.g., hub device identifications), hub            device specific secrets, relevant user and hardware            characteristics (e.g., service tier, device model, storage            capacity, processing capabilities, etc.), user interface            settings, data review preferences, etc., where the            information for associated electronic devices includes, but            is not limited to, one or more device identifiers (e.g., MAC            address and UUID), device specific secrets, and displayed            titles;        -   Device Information Database 7326 for storing device            information related to one or more hub devices, e.g., device            identifiers and hub device specific secrets, independently            of whether the corresponding hub devices have been            associated with any user account;        -   Camera events history 7328 for storing per-camera histories            of camera events, including alert events, chronologies of            alert events, and references to associated videos in the            video storage database 7320;        -   Extracted frames and extracted-frames videos database 7330            for storing frames extracted from videos received from            cameras 118 (e.g., extracted from high quality videos 7321)            and for storing extracted-frames video generated by encoding            module 73028 by encoding (e.g., in H.264 encoding format)            series of extracted frames; and        -   Event thumbnails 7332 for storing thumbnails representative            of portions of videos corresponding to events or alerts;    -   Object images database(s) 734 for storing one or more databases        (e.g., machine-trained databases) of images of objects; and    -   Suggested zone definitions 736 for storing suggested zone        definitions.

Video data stored in the video storage database 7320 includeshigh-quality versions 7321 and low-quality versions 7322 of videosassociated with each of the video sources 522. High-quality video 7321includes video in relatively high resolutions (e.g., 720P and/or 1080P)and relatively high frame rates (e.g., 24 frames per second).Low-quality video 7322 includes video in relatively low resolutions(e.g., 180P) and relatively low frame rates (e.g., 5 frames per second,10 frames per second).

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory722, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 722, optionally, stores additionalmodules and data structures not described above.

FIG. 7D is a block diagram illustrating the client interface server 556in accordance with some implementations. The client interface server556, typically, includes one or more processing units (CPUs) 734, one ormore network interfaces 736, memory 738, and one or more communicationbuses 740 for interconnecting these components (sometimes called achipset). Memory 738 includes high-speed random access memory, such asDRAM, SRAM, DDR RAM, or other random access solid state memory devices;and, optionally, includes non-volatile memory, such as one or moremagnetic disk storage devices, one or more optical disk storage devices,one or more flash memory devices, or one or more other non-volatilesolid state storage devices. Memory 738, optionally, includes one ormore storage devices remotely located from one or more processing units734. Memory 738, or alternatively the non-volatile memory within memory738, includes a non-transitory computer readable storage medium. In someimplementations, memory 738, or the non-transitory computer readablestorage medium of memory 738, stores the following programs, modules,and data structures, or a subset or superset thereof:

-   -   Operating system 742 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 744 for connecting the client        interface server 556 to other systems and devices (e.g., client        devices, video server 554, and systems connected to one or more        networks 162, FIGS. 1-5B) via one or more network interfaces 740        (wired or wireless);    -   Client interface module 746, which provides an I/O interface        between client devices 504 and the video server 554, including        but not limited to:        -   Video feed module 7462 for transmitting videos from the            video server system, or images extracted from same videos,            to client devices as video streams or periodically refreshed            images, and optionally transmitting particular views of            videos or images from videos;        -   Transcode module 7464 for rescaling (e.g., downscaling from            720P to 180P) video for transmission to client devices 504;        -   Client input module 7466 for receiving and processing input            commands from client devices (e.g., client device 504) 504            to change the video view being transmitted or controlling a            video source 522;        -   Camera view module 7468 for determining which views of            videos or images from videos are to be transmitted to client            devices; and        -   User interface module 74610 for generating user interfaces            (e.g., web pages), transmitted to client devices 504, for            viewing video feeds and corresponding event histories.

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory738, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 738, optionally, stores additionalmodules and data structures not described above.

FIG. 7E is a block diagram illustrating the camera interface server 558in accordance with some implementations. The camera interface server558, typically, includes one or more processing units (CPUs) 748, one ormore network interfaces 750, memory 752, and one or more communicationbuses 754 for interconnecting these components (sometimes called achipset). Memory 752 includes high-speed random access memory, such asDRAM, SRAM, DDR RAM, or other random access solid state memory devices;and, optionally, includes non-volatile memory, such as one or moremagnetic disk storage devices, one or more optical disk storage devices,one or more flash memory devices, or one or more other non-volatilesolid state storage devices. Memory 752, optionally, includes one ormore storage devices remotely located from one or more processing units748. Memory 752, or alternatively the non-volatile memory within memory752, includes a non-transitory computer readable storage medium. In someimplementations, memory 752, or the non-transitory computer readablestorage medium of memory 752, stores the following programs, modules,and data structures, or a subset or superset thereof:

-   -   Operating system 756 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 758 for connecting the camera        interface server 558 to other systems and devices (e.g., client        devices, video server 554, and systems connected to one or more        networks 162, FIGS. 1-5B) via one or more network interfaces 754        (wired or wireless); and    -   Camera interface module 760 for providing an I/O interface        between video sources 522 and the video server 554.

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory752, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 752, optionally, stores additionalmodules and data structures not described above.

In some implementations, at least some of the functions of the videoserver 554, client interface server 556, and camera interface server 558are performed by the hub device server system 508, and the correspondingmodules and sub-modules of these functions may be included in the hubdevice server system 508. In some implementations, at least some of thefunctions of the hub device server system 508 are performed by the videoserver 554, client interface server 556, and/or camera interface server558, and the corresponding modules and sub-modules of these functionsmay be included in the video server 554, client interface server 556,and/or camera interface server 558.

FIGS. 8A-8B are block diagrams illustrating a representative clientdevice 504 associated with a user (e.g., reviewer) account in accordancewith some implementations. The client device 504, typically, includesone or more processing units (CPUs) 802, one or more network interfaces804, memory 806, and one or more communication buses 808 forinterconnecting these components (sometimes called a chipset). Theclient device also includes a user interface 810 and one or morebuilt-in sensors 890 (e.g., accelerometer 892 and gyroscope 894). Userinterface 810 includes one or more output devices 812 that enablepresentation of media content, including one or more speakers and/or oneor more visual displays. User interface 810 also includes one or moreinput devices 814, including user interface components that facilitateuser input such as a keyboard, a mouse, a voice-command input unit ormicrophone, a touch screen display, a touch-sensitive input pad, agesture capturing camera, or other input buttons or controls.Furthermore, the client device 504 optionally uses a microphone andvoice recognition or a camera and gesture recognition to supplement orreplace the keyboard. Further, the client device 504 optionally uses theaccelerometer to detect changes in the orientation of the client device504, and in particular applications and contexts interpret the change inorientation detected by the accelerometer as user input. In someimplementations, the client device 504 includes one or more cameras,scanners, or photo sensor units for capturing images (not shown). Insome implementations, the client device 504 optionally includes alocation detection device 816, such as a GPS (global positioningsatellite) or other geo-location receiver, for determining the locationof the client device 504.

Memory 806 includes high-speed random access memory, such as DRAM, SRAM,DDR RAM, or other random access solid state memory devices; and,optionally, includes non-volatile memory, such as one or more magneticdisk storage devices, one or more optical disk storage devices, one ormore flash memory devices, or one or more other non-volatile solid statestorage devices. Memory 806, optionally, includes one or more storagedevices remotely located from one or more processing units 802. Memory806, or alternatively the non-volatile memory within memory 806,includes a non-transitory computer readable storage medium. In someimplementations, memory 806, or the non-transitory computer readablestorage medium of memory 806, stores the following programs, modules,and data structures, or a subset or superset thereof:

-   -   Operating system 818 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 820 for connecting the client        device 504 to other systems and devices (e.g., hub device server        system 508, video server system 552, video sources 522)        connected to one or more networks 162 via one or more network        interfaces 804 (wired or wireless);    -   Presentation module 821 for enabling presentation of information        (e.g., user interfaces for application(s) 824 and web browser        module 823 or the client-side module 502, widgets, websites and        web pages thereof, and/or games, audio and/or video content,        text, etc.) at the client device 504 via the one or more output        devices 812 (e.g., displays, speakers, etc.) associated with the        user interface 810;    -   Input processing module 822 for detecting one or more user        inputs or interactions from one of the one or more input devices        814 and optionally the accelerometer 892 and interpreting the        detected input or interaction;    -   Web browser module 823 for navigating, requesting (e.g., via        HTTP), and displaying websites and web pages thereof, including        a web interface for logging into a reviewer account, controlling        the video sources associated with the reviewer account,        establishing and selecting event filters, and editing and        reviewing motion events detected in the video streams of the        video sources;    -   One or more applications 824 for execution by the client device        504 (e.g., games, social network applications, smart home        applications, and/or other web or non-web based applications),        for controlling devices (e.g., sending commands, configuring        settings, etc. to hub devices and/or other client or electronic        devices), and for reviewing data captured by the devices (e.g.,        device status and settings, captured data, or other information        regarding the hub device or other connected devices);    -   User interface module 826 for providing and displaying a user        interface in which settings, captured data, and/or other data        for one or more devices (e.g., smart devices 204 in smart home        environment 100) can be configured and/or viewed;    -   Client-side module 502, which provides client-side data        processing and functionalities for device control, data        processing, data review, and monitoring and reviewing videos        from one or more video sources and camera events, including but        not limited to:        -   Hub device and device control module 8280 for generating            control commands for modifying an operating mode of the hub            device or the electronic devices in accordance with user            inputs; and        -   Data review module 8282 for providing user interfaces for            reviewing data processed by the hub device server system 508            or video server system 552;        -   Account registration module 8284 for establishing a reviewer            account and registering one or more video sources with the            hub device server system 508 or video server system 552;        -   Camera setup module 8286 for setting up one or more video            sources within a local area network, and enabling the one or            more video sources to access the hub device server system            508 or video server system 552 on the Internet through the            local area network;        -   Camera control module 8288 for generating control commands            for modifying an operating mode of the one or more video            sources in accordance with user input;        -   Event review interface module 82810 for providing user            interfaces for reviewing event timelines, camera histories            with camera events, editing event categorization results,            selecting event filters, presenting real-time filtered            motion events based on existing and newly created event            filters (e.g., event categories, zones of interest, a human            filter, etc.), presenting real-time notifications (e.g.,            pop-ups) for newly detected motion events, and presenting            smart time-lapse of selected motion events;        -   Zone creation module 82814 for providing a user interface            for creating zones of interest for each video stream in            accordance with user input, and sending the definitions of            the zones of interest to the hub device server system 508 or            video server system 552;        -   Notification module 82814 for generating real-time            notifications for all or selected alert events or motion            events on the client device 504 outside of the event review            user interface;        -   Camera view module 82816 for generating control commands for            modifying a view of a video transmitted to the client device            504 in accordance with user input;        -   Timeline module 82818 for presenting information            corresponding to video transmitted to the client device 504            in a timeline format, facilitating user manipulation of the            information displayed in timeline format, and facilitating            manipulation of display of the video in accordance with user            manipulation of the information, including requesting            additional video from the hub device server system 508 or            video server system 552 in accordance with the user            manipulation;        -   Decoding module 82820 for decoding extracted-frames video;            and        -   Suggested zones module 82822 for presenting suggested zone            definitions and associated events and processing user            interaction with suggested zone definitions; and    -   Client data 830 storing data associated with the user account,        electronic devices, and video sources 522, including, but is not        limited to:        -   Account data 8300 storing information related to both user            accounts loaded on the client device 504 and electronic            devices (e.g., of the video sources 522) associated with the            user accounts, wherein such information includes cached            login credentials, hub device identifiers (e.g., MAC            addresses and UUIDs), electronic device identifiers (e.g.,            MAC addresses and UUIDs), user interface settings, display            preferences, authentication tokens and tags, password keys,            etc.;        -   Local data storage database 8302 for selectively storing raw            or processed data associated with electronic devices (e.g.,            of the video sources 522, such as a camera 118); and        -   Video data cache 8304 for caching video and image data from            video feeds;    -   Blurred image data 832;    -   Blurring algorithms and parameters 834, for generating blurred        image data 832 from video/image data in video data cache 8304;    -   Cached extracted-frames videos 836 for storing or caching        extracted-frames videos received from the video server 554;    -   Cached event thumbnails 838 for storing or caching event        thumbnails received from the video server 554; and    -   Suggested zone definitions 840 for storing suggested zone        definitions.

Video data cache 8304 includes cached video/image data for respectivecameras associated with a user of the client device 804. For example, asshown in FIG. 8B, the video data cache 8304 includes cached video/imagedata 8304-1 for a first camera, cached video/image data 8304-2 for asecond camera, up to cached video/image data 8304-p for a p-th camera.At a given moment, video data cache 8304 may not have cached video/imagedata for a given camera (e.g., due to the camera being newly associatedwith the user, due to the cache being cleared, due to the cachedvideo/image data being expired and removed from the cache).

Blurred image data 832 includes sets of progressively blurred images forrespective cameras. For example, as shown in FIG. 8B, the blurred imagedata 832 includes blurred image data (e.g., a set of progressivelyblurred images) 832-1 for the first camera, blurred image data 832-2 forthe second camera, up to blurred image data 832-p for the p-th camera.

In some implementations, the client device 504 caches camera history aswell as video data 8304. For example, whenever the client device 504receives camera events history 7328 data from the video server 554, themost recent camera events history (e.g., history from the past twohours, the most recent 20 events) is cached at the client device (e.g.,in client data 830). This cached history data may be accessed for quickdisplay of camera history information.

In some implementations, the client-side module 502 and user interfacemodule 826 are parts, modules, or components of a particular application824 (e.g., a smart home management application).

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, modules or datastructures, and thus various subsets of these modules may be combined orotherwise re-arranged in various implementations. In someimplementations, memory 806, optionally, stores a subset of the modulesand data structures identified above. Furthermore, memory 806,optionally, stores additional modules and data structures not describedabove.

In some implementations, at least some of the functions of the hubdevice server system 508 or the video server system 552 are performed bythe client device 504, and the corresponding sub-modules of thesefunctions may be located within the client device 504 rather than thehub device server system 508 or video server system 552. In someimplementations, at least some of the functions of the client device 504are performed by the hub device server system 508 or video server system552, and the corresponding sub-modules of these functions may be locatedwithin the hub device server system 508 or video server system 552rather than the client device 504. The client device 504 and the hubdevice server system 508 or video server system 552 shown in FIGS. 7A-8,respectively, are merely illustrative, and different configurations ofthe modules for implementing the functions described herein are possiblein various implementations.

FIG. 9A is a block diagram illustrating a representative smart device204 in accordance with some implementations. In some implementations,the smart device 204 (e.g., any devices of a smart home environment 100,FIGS. 1 and 2) includes one or more processing units (e.g., CPUs, ASICs,FPGAs, microprocessors, and the like) 902, one or more communicationinterfaces 904, memory 906, radios 940, and one or more communicationbuses 908 for interconnecting these components (sometimes called achipset). In some implementations, user interface 910 includes one ormore output devices 912 that enable presentation of media content,including one or more speakers and/or one or more visual displays. Insome implementations, user interface 910 also includes one or more inputdevices 914, including user interface components that facilitate userinput such as a keyboard, a mouse, a voice-command input unit ormicrophone, a touch screen display, a touch-sensitive input pad, agesture capturing camera, or other input buttons or controls.Furthermore, some smart devices 204 use a microphone and voicerecognition or a camera and gesture recognition to supplement or replacethe keyboard. In some implementations, the smart device 204 includes oneor more image/video capture devices 918 (e.g., cameras, video cameras,scanners, photo sensor units). Optionally, the client device includes alocation detection device 916, such as a GPS (global positioningsatellite) or other geo-location receiver, for determining the locationof the smart device 204.

The built-in sensors 990 include, for example, one or more thermalradiation sensors, ambient temperature sensors, humidity sensors, IRsensors, occupancy sensors (e.g., using RFID sensors), ambient lightsensors, motion detectors, accelerometers, and/or gyroscopes.

The radios 940 enable one or more radio communication networks in thesmart home environments, and allow a smart device 204 to communicatewith other devices. In some implementations, the radios 940 are capableof data communications using any of a variety of custom or standardwireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread,Z-Wave, Bluetooth Smart, ISA100.11a, WirelessHART, MiWi, etc.) custom orstandard wired protocols (e.g., Ethernet, HomePlug, etc.), and/or anyother suitable communication protocol, including communication protocolsnot yet developed as of the filing date of this document.

Communication interfaces 904 include, for example, hardware capable ofdata communications using any of a variety of custom or standardwireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread,Z-Wave, Bluetooth Smart, ISA100.11a, WirelessHART, MiWi, etc.) and/orany of a variety of custom or standard wired protocols (e.g., Ethernet,HomePlug, etc.), or any other suitable communication protocol, includingcommunication protocols not yet developed as of the filing date of thisdocument.

Memory 906 includes high-speed random access memory, such as DRAM, SRAM,DDR RAM, or other random access solid state memory devices; and,optionally, includes non-volatile memory, such as one or more magneticdisk storage devices, one or more optical disk storage devices, one ormore flash memory devices, or one or more other non-volatile solid statestorage devices. Memory 906, or alternatively the non-volatile memorywithin memory 906, includes a non-transitory computer readable storagemedium. In some implementations, memory 906, or the non-transitorycomputer readable storage medium of memory 906, stores the followingprograms, modules, and data structures, or a subset or superset thereof:

-   -   Operating logic 920 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Device communication module 922 for connecting to and        communicating with other network devices (e.g., network        interface 160, such as a router that provides Internet        connectivity, networked storage devices, network routing        devices, server system 508, etc.) connected to one or more        networks 162 via one or more communication interfaces 904 (wired        or wireless);    -   Radio Communication Module 924 for connecting the smart device        204 to other devices (e.g., controller devices, smart devices        204 in smart home environment 100, client devices 504) via one        or more radio communication devices (e.g., radios 940)    -   Input processing module 926 for detecting one or more user        inputs or interactions from the one or more input devices 914        and interpreting the detected inputs or interactions;    -   User interface module 928 for providing and displaying a user        interface in which settings, captured data, and/or other data        for one or more devices (e.g., the smart device 204, and/or        other devices in smart home environment 100) can be configured        and/or viewed;    -   One or more applications 930 for execution by the smart device        930 (e.g., games, social network applications, smart home        applications, and/or other web or non-web based applications)        for controlling devices (e.g., executing commands, sending        commands, and/or configuring settings of the smart device 204        and/or other client/electronic devices), and for reviewing data        captured by devices (e.g., device status and settings, captured        data, or other information regarding the smart device 204 and/or        other client/electronic devices);    -   Device-side module 932, which provides device-side        functionalities for device control, data processing and data        review, including but not limited to:        -   Command receiving module 9320 for receiving, forwarding,            and/or executing instructions and control commands (e.g.,            from a client device 504, from a smart home provider server            system 164, from user inputs detected on the user interface            910, etc.) for operating the smart device 204;        -   Data processing module 9322 for processing data captured or            received by one or more inputs (e.g., input devices 914,            image/video capture devices 918, location detection device            916), sensors (e.g., built-in sensors 990), interfaces            (e.g., communication interfaces 904, radios 940), and/or            other components of the smart device 204, and for preparing            and sending processed data to a device for review (e.g.,            client devices 504 for review by a user); and    -   Device data 934 storing data associated with devices (e.g., the        smart device 204), including, but is not limited to:        -   Account data 9340 storing information related to user            accounts loaded on the smart device 204, wherein such            information includes cached login credentials, smart device            identifiers (e.g., MAC addresses and UUIDs), user interface            settings, display preferences, authentication tokens and            tags, password keys, etc.; and        -   Local data storage database 9342 for selectively storing raw            or processed data associated with the smart device 204            (e.g., video surveillance footage captured by a camera 118).

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory906, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 906, optionally, stores additionalmodules and data structures not described above.

FIG. 9B is a block diagram illustrating a representative camera 118 inaccordance with some implementations. In some implementations, thecamera 118 includes one or more processing units (e.g., CPUs, ASICs,FPGAs, microprocessors, and the like) 942, one or more communicationinterfaces 944, memory 946, and one or more communication buses 948 forinterconnecting these components (sometimes called a chipset). In someimplementations, the camera 118 includes one or more input devices 950such as one or more buttons for receiving input and one or moremicrophones. In some implementations, the camera 118 includes one ormore output devices 952 such as one or more indicator lights, a soundcard, a speaker, a small display for displaying textual information anderror codes, playing audio, etc. In some implementations, the camera 118optionally includes a location detection device 954, such as a GPS(global positioning satellite) or other geo-location receiver, fordetermining the location of the camera 118.

Communication interfaces 944 include, for example, hardware capable ofdata communications using any of a variety of custom or standardwireless protocols (e.g., IEEE 802.15.4, Wi-Fi, ZigBee, 6LoWPAN, Thread,Z-Wave, Bluetooth Smart, ISA100.11a, WirelessHART, MiWi, etc.) and/orany of a variety of custom or standard wired protocols (e.g., Ethernet,HomePlug, etc.), or any other suitable communication protocol, includingcommunication protocols not yet developed as of the filing date of thisdocument.

Memory 946 includes high-speed random access memory, such as DRAM, SRAM,DDR RAM, or other random access solid state memory devices; and,optionally, includes non-volatile memory, such as one or more magneticdisk storage devices, one or more optical disk storage devices, one ormore flash memory devices, or one or more other non-volatile solid statestorage devices. Memory 946, or alternatively the non-volatile memorywithin memory 946, includes a non-transitory computer readable storagemedium. In some implementations, memory 946, or the non-transitorycomputer readable storage medium of memory 946, stores the followingprograms, modules, and data structures, or a subset or superset thereof:

-   -   Operating system 956 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 958 for connecting the camera 118        to other computing devices (e.g., hub device server system 508,        video server system 552, the client device 504, network routing        devices, one or more controller devices, and networked storage        devices) connected to the one or more networks 162 via the one        or more communication interfaces 944 (wired or wireless);    -   Video control module 960 for modifying the operation mode (e.g.,        zoom level, resolution, frame rate, recording and playback        volume, lighting adjustment, AE and IR modes, etc.) of the        camera 118, enabling/disabling the audio and/or video recording        functions of the camera 118, changing the pan and tilt angles of        the camera 118, resetting the camera 118, and/or the like;    -   Video capturing module 964 for capturing and generating a video        stream and sending the video stream to the hub device server        system 508 or video server system 552 as a continuous feed or in        short bursts, and optionally generating a rescaled version of        the video stream and sending the video stream at the original        captured resolution and the rescaled resolution;    -   Video caching module 966 for storing some or all captured video        data locally at one or more local storage devices (e.g., memory,        flash drives, internal hard disks, portable disks, etc.);    -   Local video processing module 968 for performing preliminary        processing of the captured video data locally at the camera 118,        including for example, compressing and encrypting the captured        video data for network transmission, preliminary motion event        detection, preliminary false positive suppression for motion        event detection, preliminary motion vector generation, etc.; and    -   Camera data 970 storing data, including but not limited to:        -   Camera settings 972, including network settings, camera            operation settings, camera storage settings, etc.; and        -   Video data 974, including video segments and motion vectors            for detected motion event candidates to be sent to the hub            device server system 508 or video server system 552;    -   Object detection module 976 for detecting objects and        corresponding contours in video feeds;    -   Sources and sinks detection module 978 for detecting sources and        sinks of motion activity in video feeds;    -   Zone definition module 980 for generating suggested zone        definitions for detected objects;    -   Object images database(s) 982 for storing one or more databases        (e.g., machine-trained databases) of images of objects; and    -   Suggested zone definitions 984 for storing suggested zone        definitions.

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory946, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 946, optionally, stores additionalmodules and data structures not described above. Additionally, camera118, being an example of a smart device 204, optionally includescomponents and modules included in smart device 204 as shown in FIG. 9Athat are not shown in FIG. 9B.

FIG. 10 is a block diagram illustrating the smart home provider serversystem 164 in accordance with some implementations. The smart homeprovider server system 164, typically, includes one or more processingunits (CPUs) 1002, one or more network interfaces 1004 (e.g., includingan I/O interface to one or more client devices and an I/O interface toone or more electronic devices), memory 1006, and one or morecommunication buses 1008 for interconnecting these components (sometimescalled a chipset). Memory 1006 includes high-speed random access memory,such as DRAM, SRAM, DDR RAM, or other random access solid state memorydevices; and, optionally, includes non-volatile memory, such as one ormore magnetic disk storage devices, one or more optical disk storagedevices, one or more flash memory devices, or one or more othernon-volatile solid state storage devices. Memory 1006, optionally,includes one or more storage devices remotely located from one or moreprocessing units 1002. Memory 1006, or alternatively the non-volatilememory within memory 1006, includes a non-transitory computer readablestorage medium. In some implementations, memory 1006, or thenon-transitory computer readable storage medium of memory 1006, storesthe following programs, modules, and data structures, or a subset orsuperset thereof:

-   -   Operating system 1010 including procedures for handling various        basic system services and for performing hardware dependent        tasks;    -   Network communication module 1012 for connecting the smart home        provider server system 164 to other systems and devices (e.g.,        client devices, electronic devices, hub device server system        508, video server system 552, and systems connected to one or        more networks 162, FIGS. 1-5B) via one or more network        interfaces 1004 (wired or wireless);    -   Server-side module 1014, which provides server-side        functionalities for device control, data processing and data        review, including but not limited to:        -   Data receiving module 10140 for receiving data from            electronic devices (e.g., video data from a camera 118, FIG.            1), and preparing the received data for further processing            and storage in the data storage database 10160;        -   Device control module 10142 for generating and sending            server-initiated control commands to modify operation modes            of electronic devices (e.g., devices of a smart home            environment 100), and/or receiving (e.g., from client            devices 504) and forwarding user-initiated control commands            to modify operation modes of the electronic devices;        -   Data processing module 10144 for processing the data            provided by the electronic devices, and/or preparing and            sending processed data to a device for review (e.g., client            devices 504 for review by a user); and    -   Server database 1016, including but not limited to:        -   Data storage database 10160 for storing data associated with            each electronic device (e.g., each camera) of each user            account, as well as data processing models, processed data            results, and other relevant metadata (e.g., names of data            results, location of electronic device, creation time,            duration, settings of the electronic device, etc.)            associated with the data, wherein (optionally) all or a            portion of the data and/or processing associated with the            electronic devices are stored securely; and        -   Account database 10162 for storing account information for            user accounts, including user account information,            information and settings for linked hub devices and            electronic devices (e.g., hub device identifications), hub            device specific secrets, relevant user and hardware            characteristics (e.g., service tier, device model, storage            capacity, processing capabilities, etc.), user interface            settings, data review preferences, etc., where the            information for associated electronic devices includes, but            is not limited to, one or more device identifiers (e.g., MAC            address and UUID), device specific secrets, and displayed            titles.

Each of the above identified elements may be stored in one or more ofthe previously mentioned memory devices, and corresponds to a set ofinstructions for performing a function described above. The aboveidentified modules or programs (i.e., sets of instructions) need not beimplemented as separate software programs, procedures, or modules, andthus various subsets of these modules may be combined or otherwisere-arranged in various implementations. In some implementations, memory1006, optionally, stores a subset of the modules and data structuresidentified above. Furthermore, memory 1006, optionally, storesadditional modules and data structures not described above.

Furthermore, in some implementations, the functions of any of thedevices and systems described herein (e.g., hub device 180, hub deviceserver system 508, video server system 552, client device 504, smartdevice 204, camera 118, smart home provider server system 164) areinterchangeable with one another and may be performed by any otherdevices or systems, where the corresponding sub-modules of thesefunctions may additionally and/or alternatively be located within andexecuted by any of the devices and systems. As one example, generatingof user interfaces may be performed by the user interface module 74610(which may be located at the client interface server 556 or at the videoserver 554) or by the user interface module 826, depending on whetherthe user is accessing the video feeds and corresponding historiesthrough a web browser 823 or an application 824 (e.g., a dedicated smarthome management application) at the client device 504. The devices andsystems shown in and described with respect to FIGS. 6-10 are merelyillustrative, and different configurations of the modules forimplementing the functions described herein are possible in variousimplementations.

FIG. 11A illustrates a representative system architecture 1100 and FIG.11B illustrates a corresponding data processing pipeline 1112.

In some implementations, the server system 508 or 552 includesfunctional modules for an event processor 11060, an event categorizer11080, and a user-facing frontend 11100. The event processor 11060(e.g., event detection module 7306, FIG. 7B) obtains the motion eventcandidates (e.g., by processing the video stream or by receiving themotion start information from the video source 522). The eventcategorizer 11080 (e.g., event categorization module 7308, FIG. 7B)categorizes the motion event candidates into different event categories.The user-facing frontend 11100 (e.g., alert events module 73022, FIG.7B) generates event alerts and facilitates review of the motion eventsby a reviewer through a review interface on a client device 504. Theuser-facing frontend 11100 also receives user edits on the eventcategories, user preferences for alerts and event filters, and zonedefinitions for zones of interest. The event categorizer optionallyrevises event categorization models and results based on the user editsreceived by the user-facing frontend. The server system 508/552 alsoincludes a video and source data database 1106, event categorizationmodules database 1108, and event data and event masks database 1110. Insome implementations, each of these databases is part of the serverdatabase 732.

In some implementations, the server system 508/552 also includes objectand zone detectors 11300. The object and zone detectors (e.g., objectdetection module 73032, sources and sinks detection module 73034, zonedefinition module 73036; FIG. 7B) detects objects in scenes captured onvideo (e.g., video stream 1104), determines if the detected objects areassociated with motion activity (e.g., determine if an object isassociated with a motion activity source or sink), and generatessuggested zone definitions for the detected objects. In someimplementations, object detection includes comparing imagery (e.g.,frames) from the video to one or more objects databases 1130 (e.g.,object images database(s) 734) to identify defined objects (e.g., door,chair, couch, window, etc.) in the video. The objects databases 1130include a collection of images of known objects. In someimplementations, the objects databases are machine-trained. In someimplementations, there are object databases for different types ofobjects (e.g., a database of images of doors, another one for windows,yet another one for couches, etc.).

In some implementations, the server system 508/552 also includes a frameextractor and encoder (not shown). The frame extractor and encoder(e.g., frame extraction module 73026, encoding module 73028; FIG. 7B)extracts frames from raw video (e.g., video stream 1104) and encodes theextracted frames into an extracted-frames video. In someimplementations, the frame extractor and encoder extract frames at apredefined rate or a lower rate for portions of the video stream withoutalert events, and extracts frames at a higher rate for portions withalert events (e.g., the portion with the alert event and bracketingportions before and after the portion with the alert event). In someimplementations, the overall, average rate at which the frames areextracted is a predefined rate (e.g., 20 frames per 20 minutes, 60frames per hour, 1 frame per minute). The server system 508/552 alsoincludes an extracted frames and extracted-frames videos database. Insome implementations, extracted frames and extracted-frames videos arepart of the server database 732 (e.g., extracted frames andextracted-frames videos database 7330, FIG. 7C).

The server system 508/552 receives the video stream 1104 from the videosource 522 and optionally receives motion event candidate information1102 such as motion start information and video source information 1103such as device settings for camera 118. In some implementations, theevent processor sub-module 11060 communicates with the video source 522.The server system sends alerts for motion (and other) events 1105 andevent timeline information 1107 to the client device 504. The serversystem 508/552 optionally receives user information from the clientdevice 504 such as edits on event categories 1109 and zone definitions1111. The server system also sends to the client device 504 video 1136(which may be the video stream 1104 or a modified version thereof) and,on request by the client device 504, extracted-frames video 1138.Further, in some implementations, the server system also sends to theclient device 504 suggested zone definitions 11111 for detected objects,and receives from the client device user interaction with the suggestedzone definitions (e.g., acceptance or rejection of a suggesteddefinition, request for information for a suggested definition).

The data processing pipeline 1112 processes a live video feed receivedfrom a video source 522 (e.g., including a camera 118 and an optionalcontroller device) in real-time to identify and categorize motion eventsin the live video feed, and sends real-time event alerts and a refreshedevent timeline to a client device 504 associated with a reviewer accountbound to the video source 522. The data processing pipeline 1112 alsoprocesses stored video feeds from a video source 522 to reevaluateand/or re-categorize motion events as necessary, such as when newinformation is obtained regarding the motion event and/or when newinformation is obtained regarding motion event categories (e.g., a newactivity zone is obtained from the user).

After video data is captured at the video source 522 (1113), the videodata is processed to determine if any potential motion event candidatesare present in the video stream. A potential motion event candidatedetected in the video data is also sometimes referred to as a cuepoint.Thus, the initial detection of a motion event candidate is referred toas motion start detection and/or cuepoint detection. Motion startdetection (1114) triggers performance of a more thorough eventidentification process on a video segment (also sometimes called a“video slice” or “slice”) corresponding to the motion event candidate.In some implementations, the video data is initially processed at thevideo source 522. Thus, in some implementations, the video source sendsmotion event candidate information, such as motion start information, tothe server system 508. In some implementations, the video data isprocessed at the server system 508 for motion start detection. In someimplementations, the video stream is stored on server system 508 (e.g.,in video and source data database 1106). In some implementations, thevideo stream is stored on a server distinct from server system 508. Insome implementations, after a cuepoint is detected, the relevant portionof the video stream is retrieved from storage (e.g., from video andsource data database 1106).

In some implementations, the more thorough event identification processincludes segmenting (1115) the video stream into multiple segments thencategorizing the motion event candidate within each segment (1116). Insome implementations, categorizing the motion event candidate includesan aggregation of background factors, motion entity detectionidentification, motion vector generation for each motion entity, motionentity features, and scene features to generate motion features (11166)for the motion event candidate. In some implementations, the morethorough event identification process further includes categorizing eachsegment (11167), generating or updating a motion event log (11168) basedon categorization of a segment, generating an alert for the motion event(11169) based on categorization of a segment, categorizing the completemotion event (1119), updating the motion event log (1120) based on thecomplete motion event, and generating an alert for the motion event(1121) based on the complete motion event. In some implementations, acategorization is based on a determination that the motion eventcandidate is within a particular zone of interest. In someimplementations, a categorization is based on a determination that themotion event candidate involves one or more particular zones ofinterest.

In some implementations, one or more objects are detected in the video(1132), and one or more suggested zones are defined for at least some ofthe detected objects (1134). Image analysis may be performed on imagesfrom the video (e.g., frames of the video) to detect one or moreobjects. Also, the detected motion events may be analyzed and comparedto the video to identify source areas and sink areas in the scenedepicted in the video. The sources and sinks information may be used asan input into the object detection (e.g., for narrow down the area ofobject detection in the video), and/or as an input into the suggestedzone definition process (e.g., for selecting which object gets asuggested zone definition). The suggested zones may be presented to theuser at the client device.

In some implementations, frames are extracted from the video and anextracted-frames video is encoded from the extracted frames. In someimplementations, more frames are extracted per unit time of video fromportions of the video during and proximate to the start and end of alertevents (e.g., proximate to cuepoints) than from portions of the videowithout alert events. Thus, portions of the extracted-frames videowithout alert events have less frames per unit time than portions of theextracted-frames video with alert events.

The event analysis and categorization process may be performed by thevideo source 522 and the server system 508/552 cooperatively, and thedivision of the tasks may vary in different implementations, fordifferent equipment capability configurations, and/or for differentnetwork and server load situations. After the server system 508categorizes the motion event candidate, the result of the eventdetection and categorization may be sent to a reviewer associated withthe video source 522.

In some implementations, the server system 508/552 also determines anevent mask for each motion event candidate and caches the event mask forlater use in event retrieval based on selected zone(s) of interest.

In some implementations, the server system 508/552 stores raw orcompressed video data (e.g., in a video and source data database 1106),event categorization models (e.g., in an event categorization modeldatabase 1108), and event masks and other event metadata (e.g., in anevent data and event mask database 1110) for each of the video sources522. In some implementations, the video data is stored at one or moredisplay resolutions such as 480p, 780p, 1080i, 1080p, and the like. Insome implementations, the server system 508/552 also stores theextracted-frames video in the same or a similar database (e.g., in anextracted frames and extracted-frames video database 1130).

It should be appreciated that while the description of FIGS. 11A-11Brefer to motion events, the system architecture 1100 and the dataprocessing pipeline 1112 apply similarly to other types of events oralerts (e.g., alerts or events from other smart devices 204, such ashazard alerts). Indeed, such alerts and events may be processed togetheralongside motion events in the same system architecture 1100 andpipeline 1112.

In some implementations, one or more of the modules and data storesassociated with server system 508 or 552 (FIGS. 5A-5B) may be located inthe camera (e.g., camera 118) itself and/or in a computing device orsystem local to the camera (e.g., a server or digital video recorderdevice or hub device (e.g., hub device 180) located in the same house asthe camera 118). In some implementations, one or more of the operationsthat are described as being performed at or by the server system 508 or552 may be performed by the camera itself and/or by the computing deviceor system local to the camera. For example, the camera and/or the localdevice/system may include analogous modules and data stores forprocessing the video feed captured by the camera to detect objects(e.g., object detection module 73032 or 976), to identify sources andsinks (sources and sinks detection module 73034 or 978), and to generatesuggested zone definitions for detected objects (e.g., zone definitionmodule 73936 or 980).

Example User Interfaces

FIGS. 12A-12E illustrate example user interfaces on a client device forpresenting suggested zones in accordance with some implementations. FIG.12A illustrates a user interface 1200. In some implementations, the userinterface 1200 is displayed on a client device 504 (e.g., a mobiledevice, a desktop computer, a laptop computer). In some implementations,user interface 1200 is a user interface of an application (e.g.,application 824 corresponding to client-side module 502) that is adedicated smart home management application (also referred to below as a“smart home application”). In some implementations, user interface 1200is a user interface of a smart home management website, where the userinterface 1200 is displayed in a web browser application on the clientdevice 504. For ease of understanding, user interface 1200 is describedbelow as being displayed in a web browser application on a desktop orlaptop computer with a mouse or similar pointing device for controlling,among other things, a mouse pointer.

User interface 1200 includes a video region 1202 and a timeline 1204. Itshould be appreciated that user interface 1200 may include additionalcomponents or elements not shown or called out in the figure.

Video region 1202 is an area or region of the user interface 1200 wherevideo is displayed. The video displayed in the video region 1202 is live(e.g., streaming) or previously recorded video transmitted from server508 or 552 to the client device 504. The video transmitted from theserver 508 or 552 is originally captured by a camera 118, processed byand/or stored at server 508 or 552, and received from the server 508 or552 by the client device 504.

The timeline 1204 indicates availability of recorded video and detectedevents for a camera 118. The timeline 1204 includes elements thatindicate that video was captured by camera 118 and stored at server 508or 552 (and available for viewing), with absence of the element in atime span indicating that video was not stored for the time span (e.g.,because camera 118 was turned off, because the camera 118's networkconnectivity failed). The timeline 1204 also shows, for spans of timefor which video was captured and stored, events detected in the capturedvideo.

Video displayed in the video region 1202 depicts a scene 1206 thatincludes one or more objects (e.g., objects 1208, 1210) detected by theserver 508/552 or by the camera 118, and for which a suggested zone hasbeen defined. These objects with suggested zone definitions are markedwith object markers 1212 (e.g., markers 1212-A and 1212-B) in order tocall them out to a user viewing the video. A user may hover a mousepointer 1214 over a marked object, such as object 1208, as shown in FIG.12B. When the mouse pointer hovers over object 1208, the object 1208 isdisplayed with highlighting (e.g., different shading, thicker borders,different color).

As shown in FIG. 12C, object 1208 is highlighted when mouse pointer 1214hovers over it. In some implementations, when mouse pointer 1214 ishovered over object 1208, a call-out 1216 is displayed, as shown in FIG.12C. The call-out 1216 includes one or more thumbnails 1218 (e.g.,thumbnails 1218-A and 1218-B). A thumbnail 1218 shows a frame from aportion of the video that correspond to a detected event associated withthe suggested zone for object 1208. In some implementations, call-outsfrom the timeline for respective thumbnails of respective video portionsare also displayed.

The user may select one of the thumbnails 1218. For example, FIG. 12Dshows mouse pointer 1214 moved over thumbnail 1218-A. The user may thenselect the thumbnail 1218-A (e.g., by clicking on the mouse while mousepointer 1214 is positioned over thumbnail 1218-A). In response toselection of the thumbnail 1218-A, the video in the video region 1202plays from the video portion corresponding to the thumbnail 1218-A; theuser can view the video portion with the event corresponding tothumbnail 1218-A. For example, in FIG. 12E, the video portion playedshows scene 1206 with objects 1208 and 1210, and person 1220 walkingaround (and triggering the motion event detection). In someimplementations, the timeline also jumps to the time of the videoportion, and playback of the video portion starts from that time.

In some implementations, the user can accept or reject a marked objectas a suggested zone. For example, while mouse pointer 1214 is positionedover object 1208, as in FIG. 12B, the user may select the object 1208(e.g., by clicking on the mouse). In response to selection of the object1208, a prompt or other affordance to accept or reject the object 1208as a defined zone may be displayed. In some implementations, theboundaries of the defined zone follow the contours of the object 1208 asdetected by the server 508/552 or by the camera 118. In someimplementations, the boundaries of the defined zone enclose the object1208 (and do not necessarily follow the contours of the object), andform a polygonal region (e.g., a rectangle, a parallelogram, etc.). Ifthe user accepts the object 1208 as a zone, the area of the object 1208in the video becomes a zone like a user-defined zone; motion eventsdetected as occurring in the zone are indicated as an event associatedwith the zone and notifications may be generated as such. If the userrejects the object 1208 as a zone, the suggested zone definition for theobject 1208 is ignored, and motion events detected as occurring in whatotherwise would be the zone for the object 1208 are considered to bemotion events not associated with a particular zone, unless the motionevents happen to overlap with another zone, in which case the eventswill be associated with that zone. In some implementations, a rejectedsuggested zone is deleted from suggested zone definitions 736 and 840.In some implementations, the user may also designate a suggested zonethat has been accepted as an alerting zone or a suppression zone. If thezone is an alerting zone, the server generates alerts or notificationsfor motion events detected in the zone. If the zone is a suppressionzone, the server forgoes generating alerts and notifications for motionevents detected in the zone. In some implementations, rejection of asuggested zone is implicitly considered a designation of the suggestedzone as a suppression zone (e.g., if the rejected suggested zone isretained in suggested zone definitions 736/840 after being rejected).Further, in some implementations, a suggested zone definition, afterbeing accepted by the user, may be subsequently edited by the user(e.g., the user may edit the size and boundaries of the zone, the usermay delete the zone, the user may disable alerts and notifications forthe zone).

In some implementations, an object as detected by the server 508/552 orby the camera 118 may include multiple objects in real life. Forexample, if in the video a couch and a coffee table are close together(e.g., overlapping), they may be detected together as a single object,and the detected contours of the single object follow the outer edges ofthe two real-life objects as if they are one shape, and/or theboundaries of the suggested zone definition encloses both the couch andthe coffee table.

In some implementations, the thumbnail call-out (e.g., call-out 1216) isdisplayed when the user selects the object (e.g., object 1208) once, andthen the prompt or affordance to accept or reject the suggested zone isdisplayed when the user selects the object again.

As described above, a prompt to accept or reject a suggested zonedefinition, or to designate a suggested zone as an alerting zone or asuppression zone, may be displayed to the user. An example of such aprompt is shown in FIG. 14F. FIG. 14F shows a user interface thatincludes a frame of the video with the suggested zone enclosing thedetected object highlighted (in the case of the user interface in FIG.14F, the detected object is a door) in the frame. The user interfaceincludes an affordance to reject the suggested zone and/or designate thesuggested zone as a suppression zone (e.g., a button with the label“I'll pass”) and an affordance to accept the suggested zone and/ordesignate the suggested zone as an alerting zone (e.g., a button withthe label “Got it”). The user interface in FIG. 14F assumes that thedefault is that the zone is accepted and designated as an alerting zone,so the affordance button “Got it” accepts the default action, and theaffordance button “I'll pass” rejects the default action and rejects thezone and/or designates the zone as a suppression zone. The affordancelabels and functions may differ depending on the particularimplementation, where the default actions and the specifics of theprompt may differ.

In some implementations, an alert or notification for motion activitydetected in an accepted suggested zone is displayed to the user. Anexample of such a notification is shown in FIG. 14G. The notificationmay include a frame of the video and text indicating that there ismotion activity detected in the suggested zone (named after thecorresponding detected object (e.g., “Door”)). In some implementations,alerts or notifications, such as the one shown in FIG. 14G, aredisplayed for motion activity detected in a suggested zone that had beenaccepted and designated as an alerting zone. In some implementations, analert such as the one shown in FIG. 14G is displayed for the firstmotion activity detected in a newly generated suggested zone definitionafter generation of the definition, and selection of the notification bythe user triggers display of a user interface to accept or reject thezone and/or designation of the zone as alerting or suppression, as inthe user interface shown in FIG. 14F.

Example Processes

FIG. 13 illustrates a flowchart diagram of a method 1300 for definingsuggested zones, in accordance with some implementations. In someimplementations, the method 1300 is performed at a computing system withone or more processors and one or more memory components. For example,in some implementations, the method 1300 is performed by server system508 or 552, or one or more components thereof (e.g., object detectionmodule 73032, sources and sinks detection module 73034, zone definitionmodule 73036, etc.). In some implementations, the method 1300 isgoverned by instructions that are stored in a non-transitory computerreadable storage medium (e.g., the memory 722) and the instructions areexecuted by one or more processors of a computing system (e.g., the CPUs718). In some implementations, the method 1300 is performed jointly bythe server system 508/552 and a camera 118. In some implementations, themethod 1300 is performed by a camera 118, or by a camera 118 and a hubdevice 180.

The computing system obtains (1302) video of an environment including aplurality of objects, where the video has a field of view. The serversystem 508/552 obtains video of an environment from a camera 118. Thevideo has a field of view (e.g., scene 1206) and captures one or moreobjects (e.g., objects 1208, 1210).

The computing system identifies (1304) one or more objects of theplurality of objects within the field of view. The server system 508/552processes and analyzes the video to detect and identify one or moreobjects (e.g., objects 1208, 1210) in the scene 1206 captured by thevideo.

The computing system defines (1306) a zone of interest associated with afirst object of the one or more objects, including identifying the zoneof interest as one of an alerting zone or a suppression zone. The serversystem 508/552 processes and analyzes the video, as well as detectedmotion events, to define a suggested zone for one or more of thedetected and identified objects.

Subsequent to the defining, the computing system detects (1308) one ormore motion events captured in the video occurring at least partiallywithin the zone of interest. After definition of the zone, the serversystem 508/552 continues to detect motion events, and one or more motionevents in the defined zone may be detected.

When the zone of interest is an alerting zone, the computing systemcauses (1310) one or more notifications of the one or more motion eventsto be issued. The server 508/552 issues one or more alerts for motionevents detected in the zone if the zone is an alerting zone.

When the zone is a suppression zone, the computing system suppresses(1312) notifications of the one or more motion events. The server508/552 forgoes issuing one or more alerts for motion events detected inthe zone if the zone is a suppression zone.

In some implementations, defining a zone of interest associated with afirst object of the one or more objects includes identifying in thefield of view one or more source zones and one or more sink zones,determining contours of the first object within the field of view and afirst area of the first object defined by the determined contours,determining amounts of overlap between the area of the first object withthe source zones and the sink zones in the field of view, and inaccordance with a determination that the area of the first objectoverlaps with an area of a source zone or a sink zone by at least apredefined threshold amount, defining the area of the first object asthe zone of interest. In some implementations, the server identifiessources areas and sink areas of motion events in the video scene,determining contours of the objects and corresponding areas bound by therespective contours, and determine amounts of overlap between the objectareas and source/sink areas. If an object area overlaps a source/sinkarea by more than a threshold, the corresponding object is “selected”from the multiple objects and the corresponding area bound by itscontours is set as a suggested zone definition for the object. In thismanner, source and sink areas are an input for narrowing down the set ofobjects for which suggested zones are defined.

In some implementations, the computing system detects a change in thefield of view, identifies the first object within the changed field ofview, determines contours of the first object within the changed fieldof view and a second area of the first object defined by the determinedcontours within the changed field of view, and defines the second areaof the first object as the zone of interest. The camera 118 may bedisturbed (e.g., the camera was rotated or moved), causing the scenecaptured in the video of the camera 118 to change. The server receivesthe video from the disturbed camera, detects the objects in the changedscene and determines the contours and area of the objects within thechanged scene. For an object with a defined suggested zone, thesuggested zone is redefined as the area of the object bound by thepost-camera-disturbance contours. In some implementations, the serverdetects a change in the scene (e.g., by detecting certain pixelchanges), and determines that the camera was disturbed based on thedetected scene change. In response to the determination that the camerawas disturbed, the server performs the object detection again to detectthe objects and corresponding contours again, and re-defines thesuggested zones for the re-detected objects; the server automaticallyupdates the object detection and zone definitions in response to adisruption of the camera. In some implementations, an alert that thecamera was disrupted and that the objects and zones have beenautomatically updated is provided to the user.

In some implementations, the computing system detects one or more motionevents captured in the video occurring at least partially within thefield of view over a predefined period of time. The server 508/552detects one more motion events occurring in the scene 1206. These motionevents may be analyzed to identify source and sink areas in the scene1206.

In some implementations, identifying one or more source zones includesdetermining a number of the motion events that originated from a firstregion of the field of view within the predefined period of time,determining whether the number of originated motion events exceeds afirst predefined threshold, and in accordance with a determination thatthe number of originated motion events exceeds the first predefinedthreshold, identifying the region of the field of view as a source zone.The server identifies a source area by analyzing the detected motionevents to determine their origination regions in the scene 1206 and howmany detected motion events originate from respective regions in thescene 1206. Regions whose number of motion event originations exceeds athreshold are identified as source areas.

In some implementations, identifying one or more sink zones includesdetermining a number of the motion events that terminated at a secondregion of the field of view within the predefined period of time,determining whether the number of terminated motion events exceeds asecond predefined threshold, and in accordance with a determination thatthe number of terminated motion events exceeds the second predefinedthreshold, identifying the region of the field of view as a sink zone.Similarly, sink areas in the scene 1206 may be identified by analyzingthe motion events to determine their termination regions in the scene1206. Regions whose number of motion event terminations exceeds athreshold are identified as sink areas.

In some implementations, defining a zone of interest associated with afirst object of the one or more objects includes defining the zone ofinterest using machine learning (e.g., image analysis, object detectoralgorithm). In some implementations, defining the zone of interest usingmachine learning includes comparing one or more frames of the video to adatabase of images of known objects. The server uses machine learningalgorithms and processes (e.g., neural networks, image analysisprocesses, object detector processes) to detect objects and theircontours, and thus to define zones for one or more of these objects bythe areas bound by the detected contours. The object and contourdetection may include using the image analysis and/or object detectorprocesses to compare images from the video (e.g., individual frames ofthe video) to one or more databases of images of known objects. In someimplementations, each database is associated with a particular type ofobject. For example, one database may be for images of doors; anotherdatabase may be for images of couches, sofas, chairs, and the like; andso forth.

In some implementations, identifying one or more objects of theplurality of objects within the field of view includes identifying oneor more source zones in the field of view, detecting one or more shapesin the source zones, and for a detected shape, identifying an object towhich the detected shape corresponds. The server may detect source areasas described above, and use the source areas as an input for narrowingthe areas of the scene 1206 that will be processed for detection ofobjects. For example, regions of the scene 1206 corresponding to sourceareas are processed to detect shapes located within, and objectscorresponding to the detected shapes are identified. In someimplementations, detecting one or more shapes in the source zonesincludes performing edge detection on the source zones. Shape detectionmay include detection of edges in the source areas to find lines,associated intersections, and determining which of these lines andassociated intersections form the edges of an object in the source area.

In some implementations, identifying an object to which the detectedshape corresponds includes selecting a subset of a set of multipleobject databases based on the detected shape, and searching the selectedsubset of object databases to identify an object that best matches thedetected shape. In some implementations, selecting a subset of a set ofmultiple object databases based on the detected shape includesdetermining a type of the detected shape, and selecting the subset ofthe set of object databases in accordance with the determined shapetype. The particular shape detected may be used as an input forselecting which databases of images of objects to use for identifyingthe object to which the shape corresponds. For example, if the shape isa 4-sided approximately regular polygon, the database for images ofdoors may be used and the database for images of couches may beexcluded; doors match well with a 4-sided regular polygon, whereascouches tend to be poor match for a 4-sided polygon.

In some implementations, identifying one or more objects of theplurality of objects within the field of view includes identifying oneor more sink zones in the field of view, detecting one or more shapes inthe sink zones, and for a detected shape, identifying an object to whichthe detected shape corresponds. In some implementations, detecting oneor more shapes in the sink zones includes performing edge detection onthe sink zones. In some implementations, identifying an object to whichthe detected shape corresponds includes selecting a subset of a set ofmultiple object databases based on the detected shape, and searching theselected subset of object databases to identify an object that bestmatches the detected shape. In some implementations, selecting a subsetof a set of multiple object databases based on the detected shapeincludes determining a type of the detected shape, and selecting thesubset of the set of object databases in accordance with the determinedshape type. The server may detect sink areas as described above, and usethe sink areas as an input for narrowing the areas of the scene 1206that will be processed for detection of objects, similar to the use ofsource areas as an input for narrowing the areas of the scene 1206 thatwill be processed for detection of objects as described above.

In some implementations, the computing system transmits to a clientdevice information corresponding to the zone of interest, where theclient device is configured to enable a user of the client device toreview the zone of interest. The server 508/552 transmits informationcorresponding to suggested zones to a client device 504, and thesuggested zones are displayed in the video (e.g., as described above inrelation to FIGS. 12A-12E) played at the client device 504. The userinterface 1200 displayed at the client device 504 is configured to allowthe user to review the suggested zones (e.g., see thumbnails 1218 ofassociated events, accept or reject the suggested zone.

In some implementations, the computing system determines a labelassociated with the first object, and associates the label with the zoneof interest. In some implementations, the server assigns a label to asuggested zone. For example, an assigned label may be as simple as whatthe object is (e.g., “Door,” “Window,” “Chair).

In accordance with some implementations, a method for presentingsuggested zones is performed at computing system with one or moreprocessors, a display, and memory. For example, in some implementations,the method is performed by a client device 504. In some implementations,the method is governed by instructions that are stored in anon-transitory computer readable storage medium (e.g., the memory 806)and the instructions are executed by one or more processors of thecomputing system (e.g., the CPUs 802).

The computing system displays on the display a video of an environmentcaptured by a remote video camera, the video comprising a field of viewof the environment, where the field of view encompasses a plurality ofobjects. The client device 504 displays video from the camera 118 invideo region 1202 of user interface 1200. The video captures a scene1206 of an environment with one or more objects (e.g., objects 1208,1210).

The computing system displays a suggested zone of interest associatedwith a first object of the plurality of objects. The client device 504highlights an object that is a suggested zone. For example, FIG. 12Cshows marked and highlighted object 1208 that is a suggested zone.

The computing system provides an affordance indicating an opportunityfor a user of the computing system to accept or reject the suggestedzone of interest. The computing system receives a user response, via theaffordance, reflecting a user acceptance or rejection of the suggestedzone of interest. If the user selects a marked object, the client device504 may display a prompt to accept or reject the suggested zonecorresponding to the marked object. The user interacts with theaffordance to accept or reject the suggested zone.

When the user response is the user acceptance, the computing systemsubsequently provides or suppresses alerts to the user in response toone or more motion events detected as occurring at least partiallywithin the suggested zone. If the user accepts the suggested zone, theclient device 504 presents or forgoes presenting alerts to the user formotion events detected in the accepted zone. Whether alerts arepresented or forgone depends on whether the accepted zone is an alertingzone or suppressing zone. In some implementations, whether alerts arepresented or not depend on whether alerts are provided by the server508/552 in accordance with a designation of the zone as an alerting zoneor suppression zone.

In some implementations, the computing system receives a userdesignation of the suggested zone as an alerting zone or a suppressionzone. In some implementations, providing or suppressing alerts to theuser in response to one or more motion events detected as occurring atleast partially within the suggested zone includes providing one or morealerts to the user in response to the one or more motion events inaccordance with the designation of the suggested zone as an alertingzone. In some implementations, providing or suppressing alerts to theuser in response to one or more motion events detected as occurring atleast partially within the suggested zone includes suppressing alerts tothe user in response to the one or more motion events in accordance withthe designation of the suggested zone as a suppression zone. The usermay also designate the suggested zone as an alerting zone or asuppression zone. The client device 504 provides alerts for motionevents detected in an accepted zone designated as an alerting zone, andsuppresses alerts for motion events detected in an accepted zonedesignated as a suppression zone.

In some implementations, the computing system detects user selection ofthe suggested zone of interest, and in accordance with the userselection of the suggested zone of interest, displaying thumbnails ofone or more video segments, each of the one or more video segmentsassociated with a motion event detected as having occurred at leastpartially in the suggested zone of interest. When the user selects theobject, or hovers a mouse pointer or cursor over the object, a call-out1216 of thumbnails of video portions corresponding to motion eventsdetected in the suggested zone is displayed.

In some implementations, when the user response is the user rejection,classifying the one or more motion events detected as occurring at leastpartially within the suggested zone as motion events not associated witha specific zone of interest. If the user rejects the zone, motion eventsdetected in the rejected zone may be classified as motion events notassociated with any particular zone or with another zone with which themotion event overlaps.

In some implementations, boundaries of the suggested zone of interestassociated with the first object follow contours of the first object thefield of view. For example, FIG. 12C shows the contours of the object1208 highlighted; the boundaries of the suggested zone follow thecontours of the object.

In some implementations, the processing and analysis of the video by theserver system 508/552 to detect one or more objects and generatingsuggested zones includes processing and analyzing a single frame of thevideo, potentially at different scales and/or resolutions, for thepresence of a particular object (e.g., a door), and generating asuggested zone definition for the analyzed frame based on which regionof the frame (and potentially scale and/or resolution) yields thehighest confidence with respect to detection of the particular object.This analysis may be performed for multiple types of objects (e.g.,doors, windows, etc.) to detect particular objects in the video. In someimplementations, this single frame analysis may be performed multipletimes in response to various events. For example, the analysis may beperformed at different times of the day (such as when maximum ambientlight is expected based on a geographic location of the camera and timeof day), or in response to a detected change in placement of the camera,or in response to a threshold amount of ambient light being detected bythe camera.

In some implementations, analysis of the video by the server system508/552 to detect one or more objects includes processing multipleframes from the video over time (e.g., analyzing a number of frames perunit time over a predefined time period), and aggregating the analysisresults (e.g., areas and/or contours of detected objects, suggested zonedefinitions, and/or confidence levels). For example, over a 24-hourperiod, frames may be sampled from the video and analyzed at a rate ofone frame per hour. This periodic sampling and analysis may facilitateanalysis of the same scene over time to account for different lightingconditions, shadow positions, and so forth. Analysis of eachhourly-sampled frame may yield various results regarding detectedobjects, corresponding contours/regions, and/or associated confidencelevels. In some implementations, for each hourly-sampled frame, adetected object and its corresponding contour and region (or a polygonalregion enclosing the detected object) may be made a suggested zonedefinition for that frame without necessarily considering source and/orsink areas.

The detected objects and suggested zone definitions for the sampledframes over the entire predefined period may be aggregated. In someimplementations, the aggregation includes comparing the detected objectsand suggested zone definitions and looking for areas of intersectionover the multiple sampled frames. The areas of the detected objects maybe treated like heatmaps, and areas of intersection with the highestconfidence levels are set as the area of the detected object. In thismanner, objects may be detected for video captured by a newly installedcamera 118 that may not have sufficient motion activity captured forsources and sinks analysis.

In some implementations, the object detection by frame sampling andanalysis over a predefined time period, or even by analysis of a singleframe, may be performed at or during one or more of a variety ofinstances, e.g., when the camera 118 is first installed and turned on,whenever disruption of the camera is detected (e.g., the camera 118 wasphysically repositioned or reoriented), and/or periodically (e.g.,monthly, weekly, bi-weekly). Also, if the camera 118 is moved often(e.g., camera disruption is detected at a rate above a predefinedthreshold), the object detection may be held off until the cameradisruption rate is below the threshold for at least a predefined amountof time (e.g., for a day). In some implementations, a change in thescene is detected (e.g., by detecting certain pixel changes), and adetermination that the camera was disturbed is made based on thedetected scene change. In response to the determination that the camerawas disturbed, the object detection and zone definition generation isperformed again to re-detect the objects corresponding contours, and toupdate the suggested zones for the re-detected objects; the objectdetection and zone definitions are automatically updated in response toa disruption of the camera. In some implementations, an alert,notification, or prompt that the camera was disrupted and that theobjects and suggested zones have been automatically updated may beprovided to the user (e.g., a user may be asked to accept the updatedobject detection and zone definition in a user interface similar to theuser interface shown in FIG. 14F).

In some implementations, the object detection may detect particulartypes of objects, and suggested zone definitions for these particulartypes of objects are automatically designated as suppression zones. Forexample, the video may be processed and analyzed (e.g., in the variousways described above) to detect, among various objects, television,computer or other electronic display screens. A suggested zonedefinition for such a detected screen may be automatically designated asa suppression zone by default. In this manner, types of objects that arecommonly associated with false alarm motion activity (e.g., electronicdisplay screen) may be designated as suppression zones, so that the useris not notified of insignificant motion activity (e.g., motion in theprogramming displayed on the television or other electronic displayscreen). In some implementations, movement of such a display screen maybe tracked (by, e.g., performing periodic object detection) andnotifications from the zone corresponding to the screen are continuouslysuppressed. For example, movement of a laptop or tablet screen may betracked and motion displayed on the screen is suppressed as a user isnot likely to be interested in receiving motion alerts for motiondetected on an in-house electronic display regardless of whether thatdisplay is stationary or mobile.

While the specification describes methods and processes as performed byparticular systems or devices, it should be appreciated that thedescribed methods and processes may be performed by a variety ofappropriate systems and devices, and combinations thereof. For example,analysis of frames of the video to detect objects may be performed by aserver system (e.g., server system 508/552), a camera (e.g., camera118), a hub device (e.g., hub device 180), a camera and a server system,a camera and a hub device, or a server system and a hub device.

Example Screenshots

FIGS. 14A-14E illustrate example screenshots of user interfaces on aclient device in accordance with some implementations. In someimplementations, the user interfaces depicted in FIGS. 14A-14E are userinterfaces for a smart home application on a client device (e.g., clientdevice 504) or for a smart home management website displayed in a webbrowser application on the client device 504.

FIG. 14A illustrates a user interface with a video region and a timelineregion; FIG. 14A illustrates another example of the user interfacedescribed above with reference to FIG. 12A. Video is displayed in thevideo region, and marks are shown in the video region to mark objectsdetected in the video and for which a suggested zone is defined. Thetimeline region shows a timeline with indications of video availabilityand detected events (e.g., motion events).

FIG. 14B illustrates the user interface, with a mouse pointer hoveredover one of the marked objects. The marked object is highlighted; FIG.14B illustrates another example of the user interface described abovewith reference to FIG. 12B.

FIG. 14C illustrates the user interface, with the contours of the markedobject highlighted and a call-out for the marked object; FIG. 14Cillustrates another example of the user interface described above withreference to FIG. 12C. The call-out includes one or more thumbnails ofvideo portions associated with motion events detected as associated withthe suggested zone corresponding to the marked object (e.g., the motionevent occurred in the zone). In some implementations, call-outs from thetimeline for respective thumbnails of the video portions are alsodisplayed.

FIG. 14D illustrates the user interface, with the mouse pointerpositioned over one of the thumbnails in the call-out for the markedobject; FIG. 14D illustrates another example of the user interfacedescribed above with reference to FIG. 12D. The user may click on thatthumbnail to select it.

In response to selection of the thumbnail, the video portioncorresponding to the thumbnail is played in the video region, as shownin FIG. 14E; FIG. 14E illustrates another example of the user interfacedescribed above with reference to FIG. 12E.

FIG. 14F illustrates an example prompt that may be presented to a userwhen a suggested zone for an object is defined. In this prompt, theparticular object that is detected is a door, which is highlighted in aframe from the video feed included in the prompt. In someimplementations, a user interface similar as shown in FIG. 14F may bedisplayed when an object is detected in accordance with the methods andprocesses described above.

FIG. 14G illustrates an example notification that may be presented to auser when motion activity is detected in a zone corresponding to adetected object. The notification names the particular zonecorresponding to the detected object in which motion activity wasdetected, and may also include a frame from the portion of the videowith the detected motion activity.

It will also be understood that, although the terms first, second, etc.are, in some instances, used herein to describe various elements, theseelements should not be limited by these terms. These terms are only usedto distinguish one element from another. For example, a first userinterface could be termed a second user interface, and, similarly, asecond user interface could be termed a first user interface, withoutdeparting from the scope of the various described implementations. Thefirst user interface and the second user interface are both types ofuser interfaces, but they are not the same user interface.

The terminology used in the description of the various describedimplementations herein is for the purpose of describing particularimplementations only and is not intended to be limiting. As used in thedescription of the various described implementations and the appendedclaims, the singular forms “a”, “an” and “the” are intended to includethe plural forms as well, unless the context clearly indicatesotherwise. It will also be understood that the term “and/or” as usedherein refers to and encompasses any and all possible combinations ofone or more of the associated listed items. It will be furtherunderstood that the terms “includes,” “including,” “comprises,” and/or“comprising,” when used in this specification, specify the presence ofstated features, integers, steps, operations, elements, and/orcomponents, but do not preclude the presence or addition of one or moreother features, integers, steps, operations, elements, components,and/or groups thereof.

As used herein, the term “if” is, optionally, construed to mean “when”or “upon” or “in response to determining” or “in response to detecting”or “in accordance with a determination that,” depending on the context.Similarly, the phrase “if it is determined” or “if [a stated conditionor event] is detected” is, optionally, construed to mean “upondetermining” or “in response to determining” or “upon detecting [thestated condition or event]” or “in response to detecting [the statedcondition or event]” or “in accordance with a determination that [astated condition or event] is detected,” depending on the context.

It is to be appreciated that “smart home environments” may refer tosmart environments for homes such as a single-family house, but thescope of the present teachings is not so limited. The present teachingsare also applicable, without limitation, to duplexes, townhomes,multi-unit apartment buildings, hotels, retail stores, office buildings,industrial buildings, and more generally any living space or work space.

It is also to be appreciated that while the terms user, customer,installer, homeowner, occupant, guest, tenant, landlord, repair person,and the like may be used to refer to the person or persons acting in thecontext of some particularly situations described herein, thesereferences do not limit the scope of the present teachings with respectto the person or persons who are performing such actions. Thus, forexample, the terms user, customer, purchaser, installer, subscriber, andhomeowner may often refer to the same person in the case of asingle-family residential dwelling, because the head of the household isoften the person who makes the purchasing decision, buys the unit, andinstalls and configures the unit, and is also one of the users of theunit. However, in other scenarios, such as a landlord-tenantenvironment, the customer may be the landlord with respect to purchasingthe unit, the installer may be a local apartment supervisor, a firstuser may be the tenant, and a second user may again be the landlord withrespect to remote control functionality. Importantly, while the identityof the person performing the action may be germane to a particularadvantage provided by one or more of the implementations, such identityshould not be construed in the descriptions that follow as necessarilylimiting the scope of the present teachings to those particularindividuals having those particular identities.

For situations in which the systems discussed above collect informationabout users, the users may be provided with an opportunity to opt in/outof programs or features that may collect personal information (e.g.,information about a user's preferences or usage of a smart device). Inaddition, in some implementations, certain data may be anonymized in oneor more ways before it is stored or used, so that personallyidentifiable information is removed. For example, a user's identity maybe anonymized so that the personally identifiable information cannot bedetermined for or associated with the user, and so that user preferencesor user interactions are generalized (for example, generalized based onuser demographics) rather than associated with a particular user.

Although some of various drawings illustrate a number of logical stagesin a particular order, stages that are not order dependent may bereordered and other stages may be combined or broken out. While somereordering or other groupings are specifically mentioned, others will beobvious to those of ordinary skill in the art, so the ordering andgroupings presented herein are not an exhaustive list of alternatives.Moreover, it should be recognized that the stages could be implementedin hardware, firmware, software or any combination thereof.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific implementations. However, theillustrative discussions above are not intended to be exhaustive or tolimit the scope of the claims to the precise forms disclosed. Manymodifications and variations are possible in view of the aboveteachings. The implementations were chosen in order to best explain theprinciples underlying the claims and their practical applications, tothereby enable others skilled in the art to best use the implementationswith various modifications as are suited to the particular usescontemplated.

What is claimed is:
 1. A method, comprising: at a computing systemhaving one or more processors, and one or more memory components storingone or more programs for execution by the one or more processors:obtaining video of an environment including a plurality of objects;defining a zone including a portion of the environment; subsequent tothe defining, detecting a motion event captured in the video occurringat least partially within the zone, wherein the motion event isassociated with a first object of the plurality of objects; identifyingan object type of the first object; and based on the object type of thefirst object, causing a notification of the motion event to be issued ornot issued.
 2. The method of claim 1, wherein the object type of thefirst object is identified as a person; and the method comprises:causing the notification of the motion event to be issued based on theobject type of the first object being a person.
 3. The method of claim1, wherein the object type of the first object is identified as anobject other than a person; and the method comprises: causing thenotification of the motion event to not be issued based on the objecttype of the first object being other than a person.
 4. The method ofclaim 1, wherein defining the zone includes identifying a source zoneand/or a sink zone in a field of view included in the video.
 5. Themethod of claim 4, wherein identifying a source zone comprises:determining a number of the motion events that originated from a firstregion of the field of view within a predefined period of time;determining whether the number of originated motion events exceeds afirst predefined threshold; and in accordance with a determination thatthe number of originated motion events exceeds the first predefinedthreshold, identifying the region of the field of view as a source zone.6. The method of claim 4, wherein identifying a sink zone comprises:determining a number of the motion events that terminated at a secondregion of the field of view within a predefined period of time;determining whether the number of terminated motion events exceeds asecond predefined threshold; and in accordance with a determination thatthe number of terminated motion events exceeds the second predefinedthreshold, identifying the region of the field of view as a sink zone.7. A computing system, comprising: one or more processors; one or morememory components; and one or more programs stored in the one or morememory components and configured for execution by the one or moreprocessors, the one or more programs comprising instructions for:obtaining video of an environment including a plurality of objects;defining a zone including a portion of the environment; subsequent tothe defining, detecting a motion event captured in the video occurringat least partially within the zone, wherein the motion event isassociated with a first object of the plurality of objects; identifyingan object type of the first object; and based on the object type of thefirst object, causing a notification of the motion event to be issued ornot issued.
 8. The computing system of claim 7, wherein the object typeof the first object is identified as a person; and the one or moreprograms comprise instructions for: causing the notification of themotion event to be issued based on the object type of the first objectbeing a person.
 9. The computing system of claim 7, wherein the objecttype of the first object is identified as an object other than a person;and the one or more programs comprise instructions for: causing thenotification of the motion event to not be issued based on the objecttype of the first object being other than a person.
 10. The computingsystem of claim 7, wherein the instructions for defining the zoneinclude instructions for identifying a source zone and/or a sink zone ina field of view included in the video.
 11. The computing system of claim10, wherein the instructions for identifying a source zone compriseinstructions for: determining a number of the motion events thatoriginated from a first region of the field of view within a predefinedperiod of time; determining whether the number of originated motionevents exceeds a first predefined threshold; and in accordance with adetermination that the number of originated motion events exceeds thefirst predefined threshold, identifying the region of the field of viewas a source zone.
 12. The computing system of claim 10, wherein theinstructions for identifying a sink zone comprise instructions for:determining a number of the motion events that terminated at a secondregion of the field of view within a predefined period of time;determining whether the number of terminated motion events exceeds asecond predefined threshold; and in accordance with a determination thatthe number of terminated motion events exceeds the second predefinedthreshold, identifying the region of the field of view as a sink zone.13. A non-transitory computer readable storage medium storing one ormore programs, the one or more programs comprising instructions, which,when executed by a computing system having one or more processors, causethe computing system to perform operations comprising: obtaining videoof an environment including a plurality of objects; defining a zoneincluding a portion of the environment; subsequent to the defining,detecting a motion event captured in the video occurring at leastpartially within the zone, wherein the motion event is associated with afirst object of the plurality of objects; identifying an object type ofthe first object; and based on the object type of the first object,causing a notification of the motion event to be issued or not issued.14. The non-transitory computer readable storage medium of claim 13,wherein the object type of the first object is identified as a person;and the operations further comprise: causing the notification of themotion event to be issued based on the object type of the first objectbeing a person.
 15. The non-transitory computer readable storage mediumof claim 13, wherein the object type of the first object is identifiedas an object other than a person; and the operations further comprise:causing the notification of the motion event to not be issued based onthe object type of the first object being other than a person.
 16. Thenon-transitory computer readable storage medium of claim 13, whereindefining the zone includes identifying a source zone and/or a sink zonein a field of view included in the video.
 17. The non-transitorycomputer readable storage medium of claim 16, wherein identifying asource zone comprises: determining a number of the motion events thatoriginated from a first region of the field of view within a predefinedperiod of time; determining whether the number of originated motionevents exceeds a first predefined threshold; and in accordance with adetermination that the number of originated motion events exceeds thefirst predefined threshold, identifying the region of the field of viewas a source zone.
 18. The non-transitory computer readable storagemedium of claim 16, wherein identifying a sink zone comprises:determining a number of the motion events that terminated at a secondregion of the field of view within a predefined period of time;determining whether the number of terminated motion events exceeds asecond predefined threshold; and in accordance with a determination thatthe number of terminated motion events exceeds the second predefinedthreshold, identifying the region of the field of view as a sink zone.